Reasoning awakens through pure RL
DeepSeek-R1 was open-sourced under MIT license. The 671B model learned to reason entirely through reinforcement learning — no supervised fine-tuning cold start. This “let the model learn to think on its own” path had been discussed academically for years, but few dared run it at full scale. R1 did, matching OpenAI's o1 on multiple reasoning benchmarks while costing one twenty-seventh as much to run. Six distilled versions shipped alongside, scaling down to 1.5B so modest hardware could still produce strong reasoning. R1 marked the first time the open-source community had an uncompromising option in hardcore reasoning, and the line in the paper — “deepseek-r1, trained via pure RL” — became an electric moment for researchers everywhere.
Sources