͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

You are now 156,001+ subscribers strong.

Let’s try to reach 157k subscribers by 10 July.

Share this post & I'll send you some rewards for the referrals.

How Reddit Works 🔥

#76: Break Into Reddit Architecture for 100M Daily Users (5 Minutes)

Neo Kim

Jul 3

READ IN APP

Get my system design playbook for FREE on newsletter signup:

This post outlines Reddit's architecture. You will find references at the bottom of this page if you want to go deeper.

Share this post & I'll send you some rewards for the referrals.

Note: This post is based on my research and may differ from real-world implementation.

June 2005.

Two university students decide to build a mobile based food ordering service.

And pitched their startup idea for seed capital.

Yet their idea got rejected as smartphones didn’t exist back then.

So they pivoted to create a social network and called it Reddit.

They ran the entire site on a single machine.

And stored accounts, posts, subreddits, and comments information in Postgres.

As more users joined, they installed separate machines for app server and database.

Although it temporarily solved their performance issues, there were new problems.

Here are some of them:

1. Scalability

Their growth rate was explosive.

Yet a single database and monolithic codebase became a bottleneck.

2. Latency

Popular posts received many votes and comments.

But there was a massive delay in showing the latest data.

Onward.

CodeRabbit: Free AI Code Reviews in VS Code - Sponsor

CodeRabbit brings real-time, AI-powered code reviews straight into VS Code, Cursor, and Windsurf. It lets you:

Get contextual feedback on every commit, not just at the PR stage
Catch bugs, security flaws, and performance issues as you code
Apply AI-driven suggestions instantly to implement code changes
Do code reviews in your IDE for free and in your PR for a paid subscription

Install in VS Code for FREE

Reddit Architecture

Here’s how Reddit works:

1. Post Submission

They added more app servers and set up a load balancer in front for scalability.

Also they scaled the database vertically to handle extra I/O operations. Yet there’s a hardware limit with vertical scaling. So they partitioned the database to scale writes.

But processing new posts is an expensive operation. Because it must update different tables, such as posts, subreddits, and accounts. Besides each submission goes through rate limit checks and spam filters.

So they use a job queue.

Processing New Posts Asynchronously Using Job Queue — Processing a New Post Asynchronously Using Job Queue

Here’s how it works:

They add a message to the job queue for each new post
The processor then pulls the job from the queue asynchronously
It handles the post and updates the databases

Ready for the best part?

2. Lists

They show each subreddit and front page as an ordered list of posts.

Here’s the query to fetch a list:

SELECT * FROM posts ORDER BY hot (upvotes, downvotes);

It selects all columns from the posts table
It then orders the posts by a custom function named ‘hot’, which takes upvotes and downvotes as inputs

Yet querying the database often for an ordered list is expensive and adds extra latency.

So they store the list of post IDs along with their rank on the cache server. Think of the cache as a denormalized index of posts. This makes it easy to look up the posts by primary key.

Also they installed replica databases for read scalability.

A single Reddit post is part of many lists.

So they pre-compute each list using a job queue and store the results on the cache server for performance. Besides they store the popular lists in Cassandra for durability.

New post submissions and voting will change the list order. But voting is an expensive operation. Because it must invalidate the cache and update different tables, such as posts, comments, and accounts. So they use a job queue.

Here’s how voting works:

They add a message to the job queue
The processor pulls the job from the queue asynchronously
It then invalidates the cache and updates the databases

They update the cache through an atomic 'read-mutate-write' operation.

Yet there’s a risk of a race condition when many processors handle votes on the same post at the same time. So they use locks. It ensures correctness when many processors access the same queue.

They use Zookeeper for locking as it offers high availability.

But the time spent waiting for locks skyrocketed during peak traffic. It means a user’s vote waits in the queue for hours before processing, thus affecting the user experience.

While more processors mean extra lock contention. It happens because different processors try to acquire the lock for a popular post. And it further slows down the queue.

So they partitioned the queue. And put votes into different queues based on the subreddit the voted post is in.

They do a modulo N operation on the subreddit ID to find the correct queue. For example, if there are 3 queues, they do a modulo 3 on the subreddit ID.

The processors then handle different queues. Thus fewer processors ask for the same lock at once. It allowed them to scale well with the same number of processors without waiting for locks.

Let’s keep going!

3. Comments

Popular Reddit posts usually receive many comments with nested replies.

Yet it’s expensive to find the metadata of every comment on a single post. For example, the parent-child relationship between comments.

Denormalized List With Comments Metadata — Denormalised List With Comments Metadata

So they keep a denormalized list with metadata of the entire comment tree. Put another way, the database explicitly stores the tree's hierarchical relationships. It lets them easily find the relevant subset of the tree and look up only those comments.

But updating the materialized tree structure with new or edited comments is expensive. It needs many row updates and hierarchy order maintenance. So they use a job queue. It handles changes in batches for efficiency.

Isolated Processing Capacity With Separate Queues

Yet a trending post might receive many comments in a short period. And it could make the queue slower because of resource hogging. So they assign separate queues for trending posts. It gives isolated processing capacity.

Besides they set up a queue quota, which means each queue has a maximum length. Thus a single queue wouldn’t consume all resources. Although this might delay some comments, the system would stay functional.

TL;DR

They use CDN to route requests based on the URL path and cookie information. Also it serves media files for low latency.

High Level Overview of Reddit Architecture — High-Level Overview of Reddit Architecture

They use a load balancer to isolate request paths. It routes requests to different services based on the request path. Put simply, a slow service wouldn't affect the front page. Thus enabling modular architecture and scalability.

The client talks to the API gateway over HTTP, while the backend uses Thrift for strong schemas.
They use a microservices architecture and run Python on app servers.
They use consistent hashing to distribute traffic across cache servers. It reduces data re-distribution on server failures.
They store posts, votes, and comments in separate databases for scalability.

Their traffic increases during the day and drops at night. So they set up an autoscaler to save money and react to changing demand automatically. It watches utilization metrics and adjusts capacity accordingly.

While Reddit remains the 9th most visited site in the world.

Subscribe to get simplified case studies delivered straight to your inbox:

Author Neo Kim; System design case studies — **👋 Find me on LinkedIn | Twitter | Threads | Instagram**

Want to advertise in this newsletter? 📰

If your company wants to reach a 156K+ tech audience, advertise with me.

Thank you for supporting this newsletter.

You are now 156,001+ readers strong, very close to 157k. Let’s try to get 157k readers by 10 July. Consider sharing this post with your friends and get rewards.

Y’all are the best.

TL;DR 🕰️

You can find a summary of this article here. Consider a repost if you find it helpful.

References

Share this post & I'll send you some rewards for the referrals.

Comment

Restack