What's Behind DeepSeek's R1 Model Upgrade and Its Rapid Ascent?
Sign up for ARPU: Stay ahead of the curve on tech business trends.
Just months after triggering a trillion-dollar stock market meltdown with its debut, Chinese AI startup DeepSeek is again proving it can compete with Silicon Valley’s best. The latest version of its reasoning model, R1, recently tied for first place with models from Google and Anthropic in a real-time coding competition, despite claims of being developed at a fraction of the cost of its Western rivals.
The success of the notoriously secretive company has shattered the comforting view that US chip export controls had kept China’s AI capabilities years behind. DeepSeek’s rapid ascent raises critical questions: How did a little-known startup from Hangzhou catch up so quickly? And does its hyper-efficient approach signal a fundamental shift in the AI race, challenging the industry’s high-spending orthodoxy?
Who is the mystery founder behind DeepSeek?
Before January 2025, few people outside of China’s quant finance scene had heard of Liang Wenfeng. The 40-year-old founder of DeepSeek is exceptionally private; photos of him only became widely available after a high-profile meeting with President Xi Jinping. But behind the mystique is an “extraordinarily driven and talented” tech entrepreneur, according to a Bloomberg investigation.
Born to primary school teachers in a small village in Guangdong province, Liang excelled academically, studying machine learning and electronic engineering at the prestigious Zhejiang University. After graduating, he and two classmates founded High-Flyer Management, a quantitative hedge fund that used mathematical models to trade stocks. At its peak, the fund managed over $14 billion in assets and delivered average annualized returns of 35%.
High-Flyer developed a geeky, startup-like culture, with job postings referencing Sheldon from “The Big Bang Theory” and seeking coding “geeks” with “quirky brilliance.” Liang’s deep interest in AI was always present. A few months after OpenAI launched ChatGPT, he spun out DeepSeek in the spring of 2023 with a mission to tackle AI’s biggest challenges and crack artificial general intelligence.
How did DeepSeek achieve top performance so quickly?
The key to DeepSeek’s success lies in efficiency. While US giants were in an arms race to build bigger models on ever-larger clusters of expensive chips, DeepSeek focused on innovation in model architecture. It helped pioneer a technique called “sparsity.”
Instead of activating an entire large language model to answer a query—akin to using your whole brain for every single thought—sparsity partitions the model’s knowledge into smaller, specialized “expert groups.” When a query comes in, only the most relevant experts are harnessed. This makes the model far more computationally efficient, slashing both training and operational costs. One analyst compared it to firing up only the specific neurons needed for a task, rather than every gray cell.
This approach was partly born out of necessity. As Washington tightened chip export controls, Chinese developers were forced to find ways to do more with less.
Is it really that much cheaper to build?
This is the billion-dollar question. When DeepSeek released its V3 model in late 2024, it made the shocking claim that the model cost just $5.6 million to train. This figure, though likely referring only to the final training run, stands in stark contrast to the estimated $100 million OpenAI spent on its most advanced version of ChatGPT.
The claim was met with widespread skepticism, with some analysts estimating there was no way DeepSeek could have pulled that off without at least a billion dollars or more. However, the underlying economics of the AI industry suggest a major cost disparity is plausible. OpenAI relies on Microsoft Azure, which in turn pays a premium for Nvidia’s market-dominant GPUs. Analysts estimate Nvidia’s gross margins on these chips are around 80%—a so-called “Nvidia tax” that Google bypasses by using its own custom TPU chips, giving it a potential 4x-6x cost advantage at the hardware level.
DeepSeek’s efficiency-focused techniques like sparsity aim to achieve similar economic advantages through software and architecture, a strategy that has clearly rattled competitors and impressed investors, including Nvidia’s own CEO, Jensen Huang, who called the R1 model “genuinely a gift to the world's AI industry.”
How have US chip controls affected China's AI?
Tensions over technology reached a fever pitch in 2022 and 2023, when Washington hit Beijing with two rounds of chip export controls, limiting sales from American firms like Nvidia. While the move was a significant challenge for Chinese AI developers, it also spurred them to innovate.
But necessity is the mother of innovation. The restrictions forced Chinese companies to develop workarounds like sparsity and other efficiency-focused techniques. The result, exemplified by DeepSeek, is the emergence of models that can match or exceed Western counterparts on some benchmarks using less computational power. This has led some, including Nvidia’s Huang and Anthropic CEO Dario Amodei, to argue that the export controls may have unintentionally accelerated China’s own AI development by forcing its engineers to build better, more efficient systems.
Despite the controls, a US House committee report in April 2025 alleged “significant” ties between DeepSeek and the Chinese government and claimed the company unlawfully stole data from OpenAI. The Chinese Embassy has rejected the claims.
What is DeepSeek’s strategy?
DeepSeek’s primary strategy is built on open-sourcing its models. While competitors like OpenAI and Anthropic keep their most powerful models proprietary, DeepSeek makes its code publicly available. This has two major benefits.
First, it allows for lightning-fast adoption. By making its models cheap and accessible, developers and companies around the world can quickly test and integrate them. As a result, both Microsoft and Amazon now offer DeepSeek on their cloud services, and its models are used by AI search engine Perplexity alongside those from OpenAI and Anthropic.
Second, it’s a strategic play to undercut competitors. Bloomberg's Saritha Rai described DeepSeek's approach as making its models “so cheap that the world adopts it quickly and then it becomes mainstream,” effectively cutting out pricier proprietary competitors.
Open-sourcing also helps navigate censorship issues. An early, untweaked DeepSeek model gives bland, official answers to questions about Taiwan or Xi Jinping. But by allowing developers to customize the model with their own data, it can be adapted for different cultural contexts, speeding its global acceptance.
What are the challenges ahead?
Despite its groundbreaking start, DeepSeek faces immense pressure. The AI race is a marathon run at a sprinter’s pace. Within China, giant rivals like Alibaba, Tencent, and ByteDance are now releasing their own highly competitive models, putting pressure on DeepSeek to innovate further.
The other major challenge is commercialization. It remains unclear how DeepSeek plans to make money from its largely free, open-source models. While its technology is gaining rapid adoption, the path to profitability has not yet been defined, a question that hangs over the entire open-source AI sector.