January 20, 2025World1 min read

Reasoning awakens through pure RL

January 20, 2025: DeepSeek-R1 open-sourced under MIT license. The 671B model learned to reason entirely through reinforcement learning — no supervised fine-tuning cold start. It matched OpenAI's o1 on multiple reasoning benchmarks at one-twenty-seventh the inference cost. Six distilled versions shipped alongside, down to 1.5B parameters.

DeepSeek-R1 was open-sourced under MIT license. The 671B model learned to reason entirely through reinforcement learning — no supervised fine-tuning cold start. This “let the model learn to think on its own” path had been discussed academically for years, but few dared run it at full scale. R1 did, matching OpenAI's o1 on multiple reasoning benchmarks while costing one twenty-seventh as much to run. Six distilled versions shipped alongside, scaling down to 1.5B so modest hardware could still produce strong reasoning. R1 marked the first time the open-source community had an uncompromising option in hardcore reasoning, and the line in the paper — “deepseek-r1, trained via pure RL” — became an electric moment for researchers everywhere.