OpenAI o1: The AI Model That "Thinks" Before Responding
An AI model that reasons before responding, outperforming human experts in complex tests. Discover how this technology could change the way we interact with AI.
Back in September 2024, OpenAI launched its new series of AI models called OpenAI o1, with o1-preview being the first in the series. What's special about it? Well, imagine a super intelligent friend who, instead of blurting out the first thing that comes to mind, takes a moment to think carefully before responding. That's basically what o1-preview does.
What is this "reasoning" thing for an AI?
When I talk about an AI "reasoning," I'm not referring to it suddenly becoming conscious and philosophizing about the meaning of life (although who knows, maybe someday...). What it really means is that the model has been trained to process information in a way more similar to how we humans would do it.
Remember when you were in school and your teacher asked: "If a train leaves city X at 10:00 AM and travels at 200 km/h toward city Y, which is 620 km away, at what time will it arrive?" You probably wouldn't answer immediately. First, you'd think, "Ok, I need to calculate the travel time. For that, I divide the distance by the speed..." This internal process is what we call a "chain of thought."
o1-preview does something similar. Before responding, it generates a kind of internal dialogue where it breaks down the problem, considers different approaches, and corrects its own mistakes. It's like having a mini conversation with itself before answering you - debating options and checking its work, just like you might do inside your own head.
The interesting thing is that this "thinking" process improves the more time the model is given to think. It's like when someone tells us "take your time to respond." Generally, the more time we take, the better we respond.
But is it really that good?
When a new AI model is launched, we always ask ourselves: "How good is it really?" Well, to measure the performance of these models, experts use a series of standardized tests, like in school.
I talked a bit about this in the post: How do we know how intelligent an artificial intelligence is?, I recommend you take a look.
It turns out that o1-preview has scored very well on these "exams." Let's look at some examples:
In a qualifying exam for the International Mathematical Olympiad (something like the Olympics for mathematical geniuses), GPT-4o (the previous model) only correctly solved 13% of the problems. o1-preview? No less than 83%. It's like going from failing to being top of the class.
In competitive programming tests, o1-preview reached the 89th percentile. In simple terms, this means it's better than 89% of programmers who participate in these competitions. Not bad, huh?
In tests evaluating knowledge of physics, chemistry, and biology at the doctoral level, o1-preview outperformed actual PhD students. Yes, you read that right. In some areas, this AI model is solving problems better than people who have dedicated years to studying these subjects.
But don't worry, this doesn't mean AI is about to replace all scientists and mathematicians. Rather, it's an incredibly powerful tool that can help in solving complex problems.
You can see more about the "exams" they gave the model in this OpenAI post.
Note: This is an adaptation from an article written in September 2024, Since then, OpenAI has released even more advanced models, including o3-mini, which further builds on these capabilities.
Safety first
Now, whenever we hear about a super powerful AI, it's natural to worry a little bit. After all, what if it realizes that humans are a threat to the planet and starts creating a plan to get rid of us? (Relax, I'm joking... more or less... I think).
In all seriousness, safety is a key issue when it comes to AI. OpenAI says they've developed a new safety training approach that leverages o1-preview's reasoning capabilities to better follow safety guidelines.
What does this mean in practice? Well, one way they measure safety is by testing how well the model follows its safety rules if a user tries to make it do something it shouldn't (what they call "jailbreaking" - essentially trying to break the AI's safety rules to make it generate harmful content). In one of their toughest jailbreaking tests, GPT-4o scored 22 (on a scale of 100), while o1-preview scored 84. It's like going from having a slightly overweight bodyguard to having Batman by your side.
o1-mini: The speedy little brother
This is like the compact version of o1-preview. It's faster, cheaper (80% less costly than o1-preview) and especially good at programming tasks. If o1-preview is like a mad scientist who can solve any problem but takes time to do it, o1-mini is like a programmer on their fifth coffee who writes code faster than Trinity from The Matrix – lightning-quick and incredibly efficient.
My personal experience
Now, I know you're waiting for me to tell you if this is really a giant leap forward or just another publicity stunt. The truth is that I've had the opportunity to try it briefly and, although it seems promising, I'm still forming my opinion.
What I can tell you is that the idea of an AI that "thinks" before responding is incredibly interesting. Imagine that your assistant doesn't just respond quickly, but actually takes the time to consider the best answer. That could change the way we interact with AI in our day-to-day lives.
Note: Looking back now, these reasoning models were indeed a significant step forward in AI development, leading to the more advanced models like o3-mini that we have today.
What now?
This is just the beginning. OpenAI says they plan to continue improving both the o1 series and the GPT series. They're also working to add features like web browsing, file and image uploads, and other capabilities to make these models even more useful for everyone.
If you have access, my advice is: try it out. Play with it. See what it can do. If you don't have access, don't worry. Technology advances quickly and before you know it, models like these will be available to everyone.
Until next time!
Germán
Looking back at this post from today's perspective, this prediction was accurate - we've seen rapid development with even more advanced models like o3-mini now accessible to a wider audience, each building on the reasoning capabilities first introduced with o1.
Hey! I'm Germán, and I write about AI in both English and Spanish. This article was first published in Spanish in my newsletter AprendiendoIA, and I've adapted it for my English-speaking friends at My AI Journey. My mission is simple: helping you understand and leverage AI, regardless of your technical background or preferred language. See you in the next one!