The Thundering Herd Problem
You have a piece of data in a cache that just expired. At the exact same moment, 5,000 requests arrive for that data.
What happens next is chaos. 🧵
This is the Thundering Herd Problem.
It’s not just about a cache miss. It’s about thousands of processes waking up at the exact same time for the same reason, all stampeding toward a single resource.
Your job isn't just to fetch the data when the cache misses. Your job is to ensure that only one of those 5,000 processes does the work, while the other 4,999 wait patiently.
A classic example:
The homepage of a major news site caches the top story. The cache TTL is 60 seconds. At 12:01:00, the cache expires.
In the next 100 milliseconds, 5,000 users hit the homepage.
A junior engineer's design:
data = cache.get("top_story")
if not data:
data = db.query("SELECT ...")
cache.set("top_story", data)
All 5,000 requests see the cache miss at the same time. All 5,000 run the expensive database query simultaneously. Your database, which was protected by the cache, is instantly overwhelmed.
A senior engineer's design uses a lock. This is often called "cache stampede protection" or "single-flighting."
Only one process goes to the database. The rest wait briefly and get the data from the cache after it has been repopulated.
So, the real design question isn't: "What should happen on a cache miss?"
It's: "When this cache item expires, how do I elect a single leader to rebuild it while everyone else waits?"
When designing for high-concurrency caching, don't just plan for the miss. Plan for the stampede.