Discussion about this post

User's avatar
Filip's avatar

Great writeup! I recommend other readers see Teortaxes’ response here: https://x.com/teortaxestex/status/1885695040825016664

As well as Epoch AI’s writeup on similar themes: https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1

Tim Duffy's avatar

Excellent article, just one thing I'd add. As pointed out to me by Josh You here, the 3x per year you cite is for pretraining algorithmic efficiency only, and post-training improvements mean that the current rate of algorithmic efficiency improvement is more like 10x/year: https://x.com/justjoshinyou13/status/1884295329266426255

1 more comment...

No posts

Ready for more?