The development of enhanced reasoning capabilities has become a central focus in AI research, marked by notable advancements in recent months. DeepSeek R1 represents a significant milestone in this field, demonstrating reasoning abilities that match those of OpenAI's latest models.

Thinking Fast and Slow

Drawing from psychologist Daniel Kahneman's cognitive framework, we can understand the challenge of AI reasoning through the lens of System 1 and System 2 thinking.

In human cognition, these two modes run in parallel. System 1 runs automatically—like when you recognize a friend's face or know to brake at a red light. It's quick, effortless, and shaped by experience.

System 2 kicks in for harder tasks, like solving a complex math problem or weighing pros and cons of a major decision. This deliberate thinking demands focus and mental energy.

System 1 (Fast Thinking) System 2 (Slow Thinking)
Fast, automatic, immediate Slow, deliberate, methodical
Unconscious, operates below awareness Conscious, involves active reasoning and awareness
Error prone. Prone to biases and systematic errors Reliable. Can detect and correct errors when sufficient attention is applied

Auto-regressive LLMs such as GPT-4 can be considered System 1 thinkers. They statistically predict the next best token, and lack central planing to solve complex problems.

LLMs predicting the next word

In order to solve complex problems, machines need to "think" in steps, evaluate multiple approaches, then weigh them by evaluating and validate them.

Inference-Time Scaling

Inference-Time Scaling is a growing trend in AI that tackles System 2 thinking. Instead of generating the answer in one swift pass, the model generates multiple answer “drafts" and reviews them before settling on a final result. This approach is also known as Chain-of-Thought or CoT.

Source: Google - Evolving Deeper LLM Thinking https://arxiv.org/abs/2501.09891

Modern AI reasoning models, such as OpenAI's o1 and now DeepSeek R1 use CoT to break down their thinking into explicit steps before generating a final answer. Think of it like showing your work in a math problem – the model creates hidden steps to work through its logic and consider different approaches before arriving at its conclusion. This internal deliberation process significantly enhances the model's ability to handle complex problems and produce well-reasoned answers.

The number of internal reasoning tokens can far exceed the tokens needed to output the result. There appears to be a strong correlation between the number of intermediate tokens used and the sophistication of the model's response. This explains why o1, which uses extensive internal reasoning, can generate higher quality answers than GPT-4, but comes with higher computational costs due to the additional processing required for these internal steps.

Source: Artificial Analysis (https://x.com/ArtificialAnlys/status/1836064359237202172)

The ChatGPT o1 Pro model at $200/month is basically the same model as the base $20/month ChaptGPT o1, the difference is that o1 Pro thinks for a lot longer before responding, generating more reasoning tokens, and consuming a larger amount of inference compute for every response.

Importantly, this extra “mental” effort can transform smaller models into contenders that rival or even outperform their larger (and more expensive) counterparts.

DeepSeek R1

In early 2025, a group of Chinese researches have released DeepSeek R1, an open source reasoning model that rivals in performance to o1.

Source: DeepSeek R1 (https://arxiv.org/abs/2501.12948)

While OpenAI maintains strict secrecy around o1's internal workings, DeepSeek R1 offers total transparency by displaying its reasoning process through intermediate tokens.

DeepSeek R1 running on author's MacBook Pro via LM Studio (https://lmstudio.com)

The group that produced DeekSeek R1 has not only published the model for download (https://github.com/deepseek-ai/DeepSeek-R1), but also has been very open in explaining how it works (https://arxiv.org/abs/2501.12948).

Interestingly, according to the paper, R1 developed a number of emergent properties that it wasn't explicitly programmed to have:

  • Thinking longer on complex problems. R1 allocates additional processing time to more complex problems by prioritizing tasks based on their difficulty.
  • Reflection. R1 revisits and reevaluates its previous steps.
  • Alternatives. R1 explores alternative approaches.

These characteristics closely mirror System 2 thinking – the deliberate, analytical mode of human cognition – and it's intriguing that such patterns emerged naturally within R1.

DeepSeek R1 achieves a dual cost advantage over OpenAI's o1 - both in its development costs and operational expenses for generating responses. Since the longer reasoning models think, the better they perform, this is a significant development towards creating more advanced AI systems.

Feature Deepseek R1 OpenAI o1
Training Approach Reinforcement learning with minimal supervised data Supervised fine-tuning (SFT) with RLHF
Input Token Cost $0.14 (cached), $0.55 (non-cached) per million tokens $7.50 (cached), $15.00 (non-cached) per million tokens
Output Token Cost $2.19 per million tokens $60 per million tokens
27x more expensive than R1
Codeforce Benchmark 96.3 (Percentile) 96.6 (Percentile)
Math Benchmark 79.8 – AIME 2024
97.3 – MATH 500
79.2 – AIME 2024
96.4 – MATH 500

Geopolitics and CapEx

Access to computational power has emerged as a strategic geopolitical lever. The strategy assumes that by restricting China's access to advanced computing capabilities, the US can maintain its lead in artificial intelligence development.

DeepSeek R1's development reveals a significant weakness in this strategy. Despite facing strict export controls, the Chinese company created R1 in just two months with a modest $6 million investment, achieving performance comparable to o1.

This not only casts doubt on the effectiveness of AI-related export controls, but also challenges the massive capital expenditure figures that OpenAI and Meta claim are necessary to maintain AI leadership.

Conclusion

The emergence of these reasoning models marks a crucial shift in AI development. The next frontier in artificial intelligence isn't just about making models bigger—it's about making them think better. As systems learn to think deliberately rather than just pattern-match, we're moving closer to AI that can tackle increasingly complex cognitive tasks with human-like deliberation. These developments also force us to reconsider our traditional assumptions about the computational requirements for both training and running AI models.