QURPLUS
Posts
Deepseek R1 (curated notes & key points)

Deepseek R1 (curated notes & key points)

The AI space's 'sputnik moment'

Jack
January 28, 2025

I watched some youtube videos and took some notes about the recent events in AI, specifically the launch and the effects of Deepseek’s R1…

Dave's Garage (Superstar Microsoft Engineer):

Deepseek R1 is challenging assumptions about the AI space. People have gone form thinking; "Open AI & Anthropic have a serious stranglehold on the market", to "Wait a second…"

This is about global tech competition.

One thing that's disrupted Microsoft & Nvidia stock is that R1 has exceeded capabilities of the best American AI models AND for a fraction of the cost. It's been impressive how cheap R1 has pulled this off - allegedly under 6 million (this is a side project of a hedge fund). Compare that cost with the 10s of billions invested stateside; not to mention 500 billion for Stargate.

All of this is concerning for American companies & investors, especially because R1 has allegedly pulled this off without access to Nvidia's latest chips. If that is true, it's akin to building a BMW out of spare parts from a different car. Invariably, that would not be great for BMW's brand.

So What is Unique about R1?

It uses larger foundational models as scaffolding - like GPT4 and Lama - to create something more comprehensive, yet taking up less bandwidth. R1 is a 'distilled' model. When you train a large model, you end up with a requirement for terabytes of data due to the billions/trillions of parameters. This requires a data center's worth GPUs to function… But what if you didn't need all that power to conduct tasks? Enter distilled language models; or 'distillation'.

Distillation is where you take a larger model to train smaller ones.

like a craftsmen teaching an apprentice - you don't need the apprentice to be perfect, you just need them to do the job well

R1 takes that approach to the extreme. They've figured out how to compress the knowledge & reasoning capabilities of much bigger systems into something that is much smaller and that does not require massive data centers to operate. You can run these smaller models on consumer grade computers. This is a gamechanger.

How does this Work?

Let's say you have a large model that knows everything about astrophysics, shakespeare and coding… Instead of trying to process that raw computational power, R1 mimics the outputs of the larger model for a wide range of questions/scenarios.

So by carefully selecting examples and iterating over the training process, you can teach the smaller models to produce similar answers without having to store all that raw information itself

it's akin to copying all the answers without having to copy the entire library

But wait, it gets even crazier…

Deepseek R1 didn't only rely on a single large model for the process… It used multiple AIs - some open source like Meta's Lama, which gives more diverse perspectives/solutions. Think of it like combining a panel of experts to train the top #11 brightest student.

By combining insights from different archetypes and data sets, R1 achieves a level of robustness & adaptability that's unprecedented in such a small model.

Since it is open source, it's hard to hide bias within the model because they'd become more discoverable.

Why all of this matters?

Dramatically Lowers AIs Barrier to Entry
- doesn't require huge infrastructure. This is good for smaller companies, research labs and hobbyists looking to experiment cheaply
Smaller models are, however, more likely to 'hallucinate'
- less depth
- less nuanced, less specialized
- only as good as their teachers, meaning biases can trickle into smaller models if larger models were biased (not sure if this is discoverable in the source code if this is the case?)
In the Stock Market
- companies heavily reliant on AI licensing (Nvidia) could face less projected growth and increased competition

Tom Bilyeu:

Basically, American AI companies spent a ton of money to train their models, and R1 is leveraging all that training by curating the learnings from all of them. There's less of a moat around American AI companies because R1 leverages what American companies have invested in which has allowed them to create a cheap, open source mega model. It does so by extracting the reasoning from other trained models.

The shift in the AI space is that the models are going from regurgitating information about a topic to being proficient in the fundamentals and reasoning that underpins a topic.

The models are running predictive analysis that looks to validate their predictions against the known fundamentals of a topic/[laws of physics].

The market is reacting in a volatile fashion because some people might not want to hold stock in companies with big capital expenditures into the physical infrastructure (Stargate?).

So the 'play' in terms of 'fast money' is in the logical reasoning. Assuming that is true, and if the next big breakthrough will require training these models on topics like physics, then the market may become hostile toward that idea because they understand that the moment someone spends billions of dollars on that infrastructure, you will have more companies like Deepseek that basically thanks them for the infrastructure and extracts everything while driving costs way down. It's sort of like all of us folk alive today and how we just gratefully extracted all the understandings about electricity from folk living in the 1700s, without us people alive today ever having to do the hard stuff like flying kites in the lightning storms.

Tom mentions that some people believe that the CCP is behind R1; launching it on inauguration day, and that America tried to make it hard for China to access the chips required for building the training models, so then China found a workaround using way inferior technology (the technology they had available) to make their model more efficient and produce equal or better outputs. So everyone went form thinking America had a year or two of advantage in terms of runway for China to catch up, but it turns out it's like a month lag.