Skip to content
Carlos KiK
Go back

Google Just Made Every AI Model 6x Cheaper to Run. Memory Chip Stocks Crashed.

Google published a paper this week that did more damage to the AI hardware industry than any competitor could.

TurboQuant. A compression algorithm that shrinks LLM key-value caches from 16 bits to 3 bits. The result: 6x less memory required, 8x faster inference on NVIDIA H100 GPUs. Zero accuracy loss. Zero retraining.

Samsung, SK Hynix, and Micron stocks dropped 5-6% within hours.

And I am sitting here thinking: did we not go through this exact same thing with DeepSeek just over a year ago?

Humans are extremely dumb sometimes

I am sorry, but it has to be said. The stock market panicked after DeepSeek showed you could train competitive models for a fraction of the cost. NVIDIA lost hundreds of billions in market cap. Then everything recovered because, surprise, you still need the hardware.

And now the same thing happens again. An optimization is announced, the mental framing kicks in: “less memory needed means less chips means sell everything”, and stocks crash.

It is the same mistake. The same idiotic mental framing. The same failure to learn from an example that happened just over a year ago.

In a capitalist system, you do not stop when things get cheaper. You increase. You go for more market, more people, more applications. When compute gets 6x cheaper, you do not use the same amount of compute and pocket the savings. You use 6x MORE compute and do things that were impossible before at the old price point.

This is how every technology cycle works. When bandwidth got cheaper, we did not use less internet. We invented streaming video. When storage got cheaper, we did not store less data. We invented the cloud. When inference gets 6x cheaper, we will not run fewer AI queries. We will run orders of magnitude more.

The memory chip companies are going to be fine. The investors who sold in panic are the ones who will regret it. Again.

The Pied Piper moment

If you have watched Silicon Valley, and I watched every episode religiously when it aired, this is the Pied Piper compression breakthrough, except it is real. A pure algorithmic improvement that makes everything cheaper without sacrificing anything.

The elegance is in the “no retraining” part. Most optimization techniques require retraining the model, which costs millions of dollars and weeks of compute. TurboQuant is applied after the fact. Any existing model, any existing deployment. Just compress and deploy.

That is beautiful engineering. The kind of thing that makes you appreciate what a well-placed mathematical insight can do compared to throwing more hardware at the problem.

A godsend, but the fire keeps growing

Here is what I want to be clear about: this is fantastic. If it works as advertised, it is a godsend that puts a little bit of the fire out. The compute demand is crushing everyone right now, and anything that reduces the pressure is welcome.

But the fire keeps growing. As more people transition to AI, as more jobs get automated, as more applications become possible at the new price point, the total compute demand is going to keep increasing. Fast. TurboQuant does not eliminate the need for hardware. It buys breathing room. And that breathing room will be consumed by new demand faster than most people expect.

We are in a race where what is currently available does not even cover the current need. This optimization is like finding a more efficient way to carry water when you are in a drought. It helps enormously, but the drought is still there and getting worse.

The implementation lag nobody mentions

One more thing that the stock market panic completely ignores: this is a paper. It is not shipping in production yet.

I am on the bleeding edge and I cannot download and test TurboQuant today. It is not integrated into the inference engines I use. With some luck, maybe in a couple of weeks. For the big companies running production systems, the implementation lag is measured in months, not days.

Things take time to get implemented. There is always a lag between “paper published” and “deployed at scale”. The benefits are real but they are not instant, and the stocks moved as if every AI datacenter woke up 6x more efficient overnight. They did not. They will not for a while.

What I keep coming back to

I have always believed that the biggest breakthroughs come from optimization, not from scale. DeepSeek proved it with training. TurboQuant proves it with inference. The pattern is the same: someone looked at the problem differently and found enormous free performance sitting on the table.

But here is the thing about optimization breakthroughs: they accelerate adoption, they do not slow it down. Every efficiency gain feeds back into the system as increased demand. The people building bigger GPUs should not be nervous. They should be excited, because the addressable market for their hardware just got 6x larger.

The stock market will figure this out. Again. Just like it did with DeepSeek. Probably in about two weeks, after the panic sellers have locked in their losses and the people who actually understand the dynamics have bought the dip.

I should have bought the dip last time too. Maybe this time I will actually follow through.


Sources: Google Research, TechCrunch


Share this post on:

Previous Post
Shopify Is Betting Everything on AI Storefronts. Walmart Already Proved It Doesn't Work.
Next Post
NASA Just Let an AI Drive on Mars. After 28 Years of Humans Doing It.