Chinese AI startup DeepSeek is rattling markets. Here's what to know

DeepSeek says its R1 model performs on par with OpenAI's reasoning model for less cost and energy

We may earn a commission from links on this page.
illustration of deepseek logo on a smartphone screen with a kaleidiscope-type effect
Photo illustration of DeepSeek’s logo on a smartphone screen.
Illustration: Omer Taha Cetin/Anadolu (Getty Images)
In This Story

A Chinese artificial intelligence startup is rattling Silicon Valley and Wall Street after it demonstrated AI models on par with OpenAI’s — for a fraction of the cost and energy.

At just over a year old, Hangzhou-based DeepSeek released results for its latest open-source reasoning models, DeepSeek-R1, last week. The models showed a comparable performance to OpenAI’s reasoning models, o1-mini and o1, on several industry benchmarks.

Advertisement

In December, DeepSeek released a different model that it said cost just $5.6 million to train and develop on Nvidia H800 chips, which have reduced capabilities compared to chips used by U.S. firms. Meanwhile, U.S. rivals such as OpenAI and Meta have touted spending tens of billions on cutting-edge chips from Nvidia (NVDA-5.05%).

Advertisement

The release of DeepSeek-R1 has sparked a global sell-off of tech stocks, with Nasdaq, Dow Jones Industrial Average, and S&P500 futures all falling Monday morning.

Advertisement

Here’s what to know about DeepSeek and its AI models.

What is DeepSeek?

The Chinese AI startup was founded in 2023 by Liang Wenfeng, co-founder of Chinese quantitative hedge fund High-Flyer Capital Management. DeepSeek was reportedly formed out of High-Flyer’s AI research unit to focus on developing artificial general intelligence, or AGI, which is when AI reaches human-level intelligence.

Advertisement

DeepSeek develops open-source models, which means developers have access to and can work on its software.

What did DeepSeek announce?

DeepSeek introduced its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, last week.

Advertisement

DeepSeek-R1-Zero was trained by large-scale reinforcement learning and without supervised fine-tuning, DeepSeek said. The model “demonstrates remarkable reasoning capabilities,” but has challenges with “poor readability” and mixing language, according to the startup.

Meanwhile, the mobile app for DeepSeek’s AI chatbot, also called DeepSeek, has surged to the top of Apple’s (AAPL+0.29%) App Store downloads, while the DeepSeek site is experiencing outages from an influx of new users. The startup announced “large-scale malicious attacks” on Monday, prompting a temporary limit on registrations.

Advertisement

The chatbot was powered by DeepSeek-V3, which DeepSeek said performed comparably with Meta’s (META+0.60%) Llama 3.1 and OpenAI’s 4o at its release in December.

Unlike ChatGPT and its other chatbot competitors, DeepSeek explains its “reasoning” before responding to inquiries. However, the Chinese-developed chatbot does not directly answer prompts about politically sensitive topics such as President Xi Jinping or Taiwan.

Advertisement

How does DeepSeek’s new AI model compare to competitors such as OpenAI and Meta?

According to DeepSeek, R1 performed comparably with OpenAI’s and Meta’s models on leading benchmarks such as the AIME 2024, which tests mathematics, and the Massive Multitask Language Understanding (MMLU) which evaluates general knowledge.

Advertisement

On the community-driven Chatbot Arena leaderboard, DeepSeek-R1 comes in under Google’s (GOOGL+0.33%) Gemini 2.0 Flash Thinking model and ChatGPT-4o. DeepSeek-V3, meanwhile, fell just below OpenAI’s o1-preview and full o1 models.

Meta, which also develops open-source models, is reportedly concerned that the next version of its flagship Llama will fall behind DeepSeek’s models. Specialized groups of researchers at Meta are looking into DeepSeek’s models for ways to improve the next Llama model, The Information reported, citing unnamed people familiar with the matter.

Advertisement

Why are Nvidia and other tech stocks falling?

DeepSeek’s seemingly efficient and competitive models could challenge Nvidia’s business, which relies on major AI firms such as OpenAI, Meta, and Google spending billions of dollars on its GPUs.

Advertisement

In a technical report for its V3 model, DeepSeek said it used a cluster of just under 2,050 graphics processing units (GPUs) from Nvidia for training — much less than the tens of thousands of chips U.S. firms are using to train similarly-sized models. Meta, for example, used 16,000 of Nvidia’s more powerful H100s to train its Llama 3 405B model.

Last week, Meta chief executive Mark Zuckerberg said the tech giant is planning to invest between $60 billion and $65 billion in capital expenditures on AI in 2025. He added that Meta’s Llama 4 model is expected to “become the leading state of the art model” this year, and that the company plans to “build an AI engineer” that can contribute more code to its research and development efforts.

Advertisement

Meanwhile, OpenAI, SoftBank (SFTBY-0.93%), and Oracle (ORCL-0.99%) recently announced a half-a-trillion-dollar AI infrastructure plan with the Trump administration called Stargate. The new joint venture “intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States,” the AI startup said in a statement.

What could DeepSeek mean for U.S. chip restrictions going forward?

Aside from prompting questions about AI chip spending, DeepSeek’s success challenges U.S. efforts to curb advanced chips from entering the country.

Advertisement

According to its technical report, DeepSeek used Nvidia’s H800 chips for its V3 model, which are a less powerful version of the chipmaker’s H100s that it is allowed to sell to Chinese firms under U.S. chip restrictions.

Before leaving office earlier this month, the Biden administration introduced even more measures focused on keeping AI chips out of China. The new regulations reinforce and build upon previous U.S. export controls aimed at restricting China from advanced semiconductors that can be used for AI and military development. Under the rules, foundries and packaging companies that want to export certain chips are subject to a broader license requirement unless certain conditions are met.

Advertisement

The U.S. also published new guidelines aimed at curbing AI chip sales from U.S. firms, including Nvidia, to specific countries and companies. The new export controls include three tiers of chip restrictions, which give friendly nations full access to U.S.-made chips but add new limitations to others.