Llama 3 Trivia : For the Enquiring Minds

3 min readApr 21, 2024

Meet Llama 3, Meta’s latest leap into the world of Generative AI — this new foundation model is setting the stage for some exciting developments.

Wondering what makes Llama 3 tick and it’s consequences ? Let me dive into the intuitive side of things and unpack what this could mean for the future of AI.

For those of you hungry for the hardcore technical details, check out these links: [Here], [Here], and GitHub [Here]. Let’s get into it!

1 — Correlation or Causation ? — Nvidia falls 10% as Llama 3 hits the market

May be both, but rest assured Llama 3 does make developing grounds up foundation models a luxury for most of the companies.

The word on the street is that Meta spent anywhere between $100M — $1B, over 7.7M GPU hours on (reportedly) 600,000 H100s.
Please add comments as and when you see more reliable numbers. The model card has good details.

And, free to use till and commercial license required only if you have more than 700M monthly active users (I think almost all of us fall in that group of plebeians 🙃) If you don’t believe me, please check their license !!

2 — But, Meta still can’t get more R-E-S-P-E-C-T

“Basically they leveled the playing field in AI while keeping their edge on the social media market “ — Heard at Twitter or LinkedIn

3 — Llama 3 does disrupt the market , even though it is still an evolution and not a revolution

4 — Data is everything !

I think the major innovation in Llama 3 is the way they use data

Llama 3 is only 300 lines of code (of course, over millions of PyTorch). You just need something robust that you can pour trillions of tokens of high quality training data into (15T+ for Llama 3). (Ref : Nice write up by Tobias Zwingmann in LinkedIn [Here])

Lots of interesting data layer tweaks — Better tokenizer, Massive pre-training corpus, Filtering pre training data, Overtraining/Post training data quality, curating the data mix (for best performance) and performing multiple rounds of QA on human annotations.
The evolution in architecture helped - but attention to data quality and the data pipeline are the main innovation
Their data filtering pipelines include heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality. And this was done mainly by Llama 2 !