- The Artificially Intelligent Enterprise
- Posts
- What you Need to Know About DeepSeek
What you Need to Know About DeepSeek
Is China's new foundation model hypeworthy, or just an also ran
Usually, I don’t chase the news cycle. There are tons of outlets that are giving you up-to-the-minute news, many with minimal insights.
I typically take my time to digest and then offer a more thoughtful approach.
I was initially going to discuss OpenAI and Project Stargate this week, but I am postponing that for this week's news story, DeepSeek. I feel that the story has filled the news cycle with a lot of FUD (Fear, Uncertainty, and Doubt).
So here’s the headline. DeepSeek, a new large language model from a Chinese startup of the same name, recently took the news cycle by storm because the implication was that the new model rivaled OpenAI and others with a fraction of the cost.
This sent investors, businesses, and governments scrambling.
It was reported that the launch helped drop over $1 trillion in value, mainly from U.S.-based tech companies like Nvidia and OpenAI.
So, this edition is a bit special because it provides some deeper analysis and my thoughts on the news.
So keep calm and AI on.

Looking for unbiased, fact-based news? Join 1440 today.
Upgrade your news intake with 1440! Dive into a daily newsletter trusted by millions for its comprehensive, 5-minute snapshot of the world's happenings. We navigate through over 100 sources to bring you fact-based news on politics, business, and culture—minus the bias and absolutely free.

Want to see a magic trick? How about making over $1 trillion vanish in one day? That was DeepSeek’s debut among investors and the tech industry, as concerns about the Chinese AI startup sent shockwaves everywhere on Monday.
But that’s not a first-time occurrence. I’ve seen this before, perhaps not at this scale, but in a previous hype cycle around cloud computing. Moving workloads to the cloud was a novel approach in the early days of cloud computing. Until then, we had used data centers that had been virtualized primarily using the software made by VMware (now part of Broadcom). This evolution is now commonplace, but it was very exciting then.
During that era, a new open source upstart technology called OpenStack was introduced. One of the vendors supplying it mentioned to my friend Reuven Cohen (now a legit AI thought leader you should follow if you want to get smart about AI) that PayPal would replace VMware’s 80,000 servers with OpenStack. Reuven wrote an article based on a reliable source, and carnage for VMware ensued.
The information turned out to be more optimistic than true. PayPal was starting an overblown small pilot, which caused VMware's market cap to drop $2 billion. The takeaway is that sometimes, hype kicks off hysteria in markets.

What You Need to Know About DeepSeek
Here’s my breakdown of the DeepSeek model and what I think are the tangible takeaways. I tried to make this more approachable and used analogy; if you want to learn more, I’d dig into writing from Wharton Professor Ethan Mollick, who has a very measured and intelligent commentary on DeepSeek. Anyhow here’s my attempt to make the whole DeepSeek news cycle understandable.
Why is everyone freaking out over DeepSeek?
In the simplest of terms, DeepSeek showed how quickly it can catch up to the US AI industry by doing more with less than was previously believed possible. It’s also charging a lot less than everyone else. So when companies are talking about finally getting ROI on those AI investments, DeepSeek is making those calculations much harder.
So it’s just some more competition?
Imagine you make incredible cookies. People love them and feel like they’ll ultimately change how people eat. The downside is these cookies require many expensive ingredients and tools. But your investors don’t care because the cookies are great. And everyone will pay a fortune for them, meaning they’ll eventually make loads of money from backing you.
Then, suddenly, someone starts baking cookies using fewer, cheaper ingredients. The cookies are priced much lower than yours and taste just as good. What’s worse, they’re willing to share their recipe with everyone. That’s what’s happening between the US AI industry and DeepSeek.
But if we can do things cheaper than before, isn’t that good?
Not if you bet a lot of money on needing a ton of expensive infrastructure to succeed in the space. The general thesis has been spending a lot on AI, which is okay because you can eventually make that money back. But DeepSeek’s cheaper models using older chips call those beliefs — and the trillions spent on AI infrastructure — into question. Meanwhile, customers are ready to give it a spin. According to an internal document, AWS already sees clients requesting access to DeepSeek models.
But what if the Chinese company isn’t telling the whole truth?
Scale AI CEO Alexandr Wang said Chinese AI startup DeepSeek has been using 50,000 Nvidia H100 chips, but workers aren't allowed to discuss it due to US export restrictions, as per a report. He’s not the only one calling shenanigans, Elon Musk, is skeptical too.
DeepSeek, being a Chinese company, adds another layer of complexity to the already tense geopolitical situation surrounding AI development. This situation further emphasizes the need for international collaboration and responsible AI governance. It's also important to remember that DeepSeek's models are likely trained on censored data from China, which could influence their outputs and limit their ability to address specific topics or perspectives.
Here’s a prime example: ask Deep Seek about things that China has done to violate human rights or oppress their citizens, and Deep Seek will stonewall you or defend the People’s Republic of China.

Deep Seek’s answers to questions about Tiananmen Square and the Uyghurs
And the market melted down?
For many tech stocks, yes. (Companies like Apple, which focuses on integration rather than building models, could benefit from this.) On Monday, energy companies, viewed as key players powering the AI revolution, also took a bath. In the big picture, DeepSeek may have exposed the market warts investors have been willing to overlook: it’s overvalued, and companies are overspending.
However, many of these companies don’t have valid business models today; they are based on speculation and future gains from today’s investments. The same thing happened in the early days of the Internet. We saw massive investment but there were a large number of earlier failures. Take the example of Pets.com, which is by far the most famous example of a failed company during the dot com boom. It had a smart idea — selling pet supplies online, but the business lost $147 million in the first 9 months of 2000.
We will see the same level of failures and successes, but I doubt that this DeepSeek news will have a material, technological effect on those companies leadership. However, it might have a psychological effect.
So, is that a wrap on AI in the US?
Not in my opinion. It’s yet another turn of the crank. You might have heard the term “Jevons paradox” thrown around. That’s the idea that even if better efficiency means using fewer resources, the demand will increase so much that it won’t matter. I think this is what is inevitable.
In the late '90s, we paid about $20 a month for internet access, which, after adjusting for inflation, is around $38.40 today. And what did that get me? A blazing-fast 56 Kbps connection on a good day.
Today, the average internet bill runs between $40 and $80 monthly, but the speeds have exploded. An average connection delivers 262 Mbps—4,678 times faster than dial-up. The raw price of internet service has increased, but when you break it down by cost per Kbps, we get bandwidth between 2,246 and 4,492 times cheaper than in the dial-up era.
The bottom line? We’re paying a little more but getting exponentially more value. Internet infrastructure has transformed, and today’s speeds would have been unimaginable when we were all fighting for phone lines to log onto AOL.
I believe this will happen to AI. Just like internet speeds and costs have evolved dramatically, AI will become exponentially more powerful while becoming more accessible and affordable. Currently, cutting-edge models require significant computational power, but AI will be embedded seamlessly into everyday applications at a fraction of today's costs as innovation continues. In the same way that we take fast internet for granted, future AI advancements will be ubiquitous and indispensable.

Growth of Internet Usage over Time
What are the key players saying?
Sam Altman, CEO of OpenAI, has already responded to the DeepSeek challenge by announcing that OpenAI would accelerate the release of "better models." This indicates that the competition is heating up, and we can expect rapid advancements in the AI field in the coming months.
deepseek's r1 is an impressive model, particularly around what they're able to deliver for the price.
we will obviously deliver much better models and also it's legit invigorating to have a new competitor! we will pull up some releases.
— Sam Altman (@sama)
2:29 AM • Jan 28, 2025
Elon Musk, is also skeptical of their numbers. Despite losing $20 billion in net worth NVIDIA CEO, Jensen Huang remained quiet on the topic, but NVIDIA made the following statement, “DeepSeek is an excellent AI advancement and a perfect example of test-time scaling; DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely available models and compute that is fully export control compliant.” In response to an analyst’s question about DeepSeek’s impact on Meta’s AI spending, Zuckerberg said spending heavily on AI infrastructure will continue to be a “strategic advantage” for Meta.
The Real Innovation
DeepSeek boasts a massive 671B total parameters, with 37B activated for each token. It is incredibly efficient because It uses a Mixture-of-Experts (MoE) architecture, where only a fraction of these parameters are activated for any given task.
But this isn’t our first time seeing this fanfare or type of model. In December 2023, the French AI company Mistral released a new model called "Mixtral 8x7B." This model utilizes a "Mixture of Experts" architecture, essentially meaning it leverages multiple "expert" models to handle different aspects of a task. This makes it significantly more efficient and powerful than traditional single-model approaches. This release marked a significant advancement in the field of large language.
Headlines from Venture Beat afterward: Europe’s largest seeded startup Mistral AI releases the first model, outperforming Llama 2 13B. And Meta and Llama kept growing mindshare. The New York Times wrote a few months later, Europe’s A.I. ‘Champion’ Sets Sights on Tech Giants in the U.S., and OpenAI kept their lead and raised an additional $6.6 billion in investment only a few months later.
This jockeying will continue for a while, and I suspect the next big leap will be one of these players announcing they have achieved AGI (artificial general intelligence). At this point, AI matches or exceeds human intelligence, and the markets will go crazy again.
So what is the Innovation?
DeepSeek V3 is a very large and powerful language model. Think of it like a super-smart computer program that can understand and generate human-like text. To make this program work efficiently, the creators came up with some clever tricks:
Multi-Head Latent Attention (MLA): Imagine you have a library filled with books (information). When you need to find something specific, you use a card catalog (attention mechanism) to locate the right book. MLA is like a super-efficient card catalog that compresses all the information into a smaller, easier-to-search format. This allows the program to find what it needs much faster and without using as much memory.
Auxiliary-loss-free load balancing: Think of the program as a team of experts working together. Each expert has a specific skill. This trick ensures that the workload is evenly distributed among the experts, preventing any single expert from overloading. This makes the team work more efficiently and smoothly.
Multi-Token Prediction (MTP): Instead of reading a sentence word by word, this trick allows the program to read and understand multiple words simultaneously. This is like speed reading for a computer program, allowing it to process information much faster.
FP8 mixed precision training: This method uses a more efficient way to store and process numbers. It's like using shorthand instead of writing out every word in full. This saves space and speeds up calculations.
These innovations result in exceptional performance on various benchmarks, rivaling even leading closed-source models like GPT-4, while requiring significantly fewer computational resources and costs.
DeepSeek V3's open source nature also means all its competitors can access the same “cookie recipe.” (The DeepSeek V3 whitepaper, their “cookie recipe” is available here.)

Are you looking to upskill in AI? Do you need the knowledge to further your business and your career? Join me for this fantastic AI event on March 17th and 18th in Durham, NC.

The Final DeepSeek Takeaway
We are on a rollercoaster, experiencing an unprecedented technological advance, and we will experience many ups and downs before it all shakes out. Don't get caught up in the hysteria. Instead, focus on upskilling and monitoring the technology's progress. Education on how AI works and what it means to your business, government, and the world is essential.
I think what is going to happen is that we'll see AI commoditized in general. We will use models from multiple sources. I believe that we'll find new ways to improve. Consider Moore's law and what a computer was like in 2000 versus today. Moore's law is part of a more significant phenomenon of exponential cost decline and performance increase.
Like desktop computers, storage, RAM, and processors, AI's cost will likely decrease while its capabilities increase exponentially. This will lead to broader adoption, new applications, and unforeseen innovations. The DeepSeek situation is just the beginning of this exciting and transformative journey.
This situation illustrates the dynamic nature of the AI industry and the potential for disruption from unexpected players. DeepSeek's emergence isn’t a wake-up call for the US tech industry; it’s a call for all of us to drive innovation that improves efficiency and a more open approach to AI development.
How it affects the market should be based on real tangible results, not mass hysteria.

How did we do with this edition of the AIE? |
I appreciate your support.
![]() | Your AI Sherpa, Mark R. Hinkle |
Reply