There is nothing particularly surprising about DeepSeek's success, but it certainly surprised a lot of people. China's newest large language model (LLM) is just as good as the cutting-edge stuff in the United States, but much cheaper to train and to use.
It's an impressive feat: because of export-controls that limit China's access to the fastest NVIDIA chips, Chinese programmers have to work smarter than their American peers. They couldn't train the system as long, so they focused on training it more effectively, creating high-quality "synthetic" data rather than just more and more low-quality internet data. While there are dangers to synthetic data, companies have gotten a lot better at deploying it smartly and limiting dangers of model collapse. The DeepSeek programmers really hammered at this approach to make training time more effective, allowing them to build a smaller but still powerful model. It runs cheaper, too: it requires less compute and less electricity, making it a lot less wasteful.
But a quick glance at the summaries of AI in 2024 would tell you that models were getting more efficient, both in training and running. The caveats in the summaries was that companies like Google, Microsoft, and OpenAI were all spending big on GPUs and data centers, telegraphing their strategy of throwing more compute at the AI problem. DeepSeek wasn't a surprise in terms of the technology--we knew smaller, excellent models were coming--but it was a surprise to investors who hoped that the future belonged to the rich.

A big problem for all of these approaches is that self-supervised machine learning, like that underlying LLMs, is based on compressing of data. Machine learning tries to embed the statistics of language into its "weights," the billions of different connections between nodes. This treats the memory less like a file in your computer and more like a piece of muscle memory: you hand someone a fork and they instantly know what to do with it. Once embedded in the weights, the model can engage in "decompression," reproducing what it says on the internet when given a little bit of data to start (with interpolations to fill in missing data). And there is pretty much always a better compression possible in the machine learning world, even if it takes time to find it.
DeepSeek pulled off the obvious trick: start smaller, work smarter, and aim for a better compression. Although a blow to investors, it isn't surprising they succeeded. And the researchers made it "open-weight," which means anyone can download it, use it, build products on it, and so on. It makes it hard for companies to justify spending grips of money on OpenAI, a product that is more expensive without being more effective. But DeepSeek is still, at heart, a smaller version of the Same Thing. It isn't turning heads because it is novel or interesting in design. It's a good-but-small version of what American companies have.
The question is whether anyone has an idea for getting past the current limitations of LLMs. This is the trillion-dollar question: companies still don't have a lot of use for unreliable machines, so all of the expected value depends on somebody finally overcoming THE limitation of these systems (the same one present a few years ago). DeepSeek is cool and points to how these models can be less energy-intensive and expensive. But it doesn't solve the bigger problem: how the heck can anyone make money with these things?
Comments