Breakthrough · Nov 2023 – Jul 2024
Prying the closed-source wall open with structural innovation
Less than four months after spinning out, DeepSeek shipped its first open Coder. The next nine months ran on a model-every-other-week cadence: LLM, Coder V2, and then V2 in May 2024 — its Multi-Head Latent Attention cut KV cache by 93.3% and priced inference at 0.14 yuan per million tokens, blowing through the floor of China's API market. In July, Liang Wenfeng's line — 'China cannot remain a follower forever' — broke through to English-speaking tech circles.
Events in this era
- November 2, 2023Read full →
First shot in code: who said open source couldn't?
November 2, 2023: DeepSeek-Coder shipped in four sizes (1.3B–33B), fully open from day one. Trained on 2 trillion tokens, 87% code and 13% natural language, covering 80+ programming languages. The 33B variant beat CodeLlama-34B on multiple benchmarks — less than four months after the company spun out.
- November 29, 2023Read full →
A bilingual base model, plainly delivered
27 days later, DeepSeek-LLM in 7B and 67B sizes — a bilingual base model. No launch event, no PR, just an arXiv paper and downloadable weights on HuggingFace. The 'a-model-every-other-week' minimalist release cadence became the company's signature.
- May 7, 2024Read full →
MLA rewrites attention; the price war begins
May 7, 2024: DeepSeek-V2 — 236B total parameters, 21B active, 128K context. Multi-Head Latent Attention (MLA) slashed KV cache requirements by 93.3% and lifted throughput 5.76x. Priced at 0.14 yuan per million tokens, V2 ignited China's API price war.
- June 2024Read full →
338 languages; coding parity with GPT-4 Turbo
June 2024: DeepSeek-Coder V2 — MoE architecture, 338 programming languages supported, coding ability on par with GPT-4 Turbo on multiple benchmarks. Only seven months after the original Coder release. The ceiling on open-source coding models was kicked open.
- July 2024Read full →
“China cannot remain a follower forever”
July 2024: Liang Wenfeng's deep interview with 36Kr. Three lines kept landing: 'Open-sourcing papers loses you nothing.' 'China cannot remain a follower forever.' 'More investment doesn't necessarily produce more innovation.' The English translation spread through overseas tech circles.