Chinese Startup DeepSeek's AI Model Outperforms Meta and OpenAI on a Shoestring Budget
Sign up for ARPU: Stay ahead of the curve on tech news.
A new large language model (LLM) from Chinese startup DeepSeek has generated significant buzz in the global AI industry, surpassing comparable models from Meta Platforms and OpenAI in benchmark tests. DeepSeek's model, as reported by South China Morning Post, boasts 671 billion parameters and was trained in just two months at a cost of US$5.58 million, representing a remarkably efficient use of resources compared to its larger competitors.
The company, based in Hangzhou, highlighted the model's impressive performance in a WeChat post. DeepSeek V3's efficiency is particularly noteworthy, having been trained using significantly less computing power than models developed by established tech giants.
The concept of "open weights" is key to DeepSeek's approach. This involves releasing only the pre-trained parameters of the model, allowing third parties to utilize it for inference and fine-tuning. However, crucial details like the model's training code, original dataset, architectural specifics, and training methodology are not made public.
OpenAI co-founder Andrej Karpathy reacted to DeepSeek's technical report on X, stating, "DeepSeek making it look easy... with an open weights release of a frontier-grade LLM trained on a joke of a budget." This underscores the significant cost advantage achieved by DeepSeek in achieving comparable or superior results to established players.
DeepSeek's achievement demonstrates the remarkable progress of Chinese AI firms despite US sanctions that have limited their access to advanced semiconductors crucial for model training. The company's success in developing a powerful LLM at a fraction of the typical cost highlights the potential for innovation and disruption within the Chinese AI sector.