How Reddit Works
You are now 156,001+ subscribers strong. Let’s try to reach 157k subscribers by 10 July. Share this post & I'll send you some rewards for the referrals. Get my system design playbook for FREE on newsletter signup: This post outlines Reddit's architecture. You will find references at the bottom of this page if you want to go deeper.
Note: This post is based on my research and may differ from real-world implementation. June 2005. Two university students decide to build a mobile based food ordering service. And pitched their startup idea for seed capital. Yet their idea got rejected as smartphones didn’t exist back then. So they pivoted to create a social network and called it Reddit. They ran the entire site on a single machine. And stored accounts, posts, subreddits, and comments information in Postgres. As more users joined, they installed separate machines for app server and database. Although it temporarily solved their performance issues, there were new problems. Here are some of them: 1. ScalabilityTheir growth rate was explosive. Yet a single database and monolithic codebase became a bottleneck. 2. LatencyPopular posts received many votes and comments. But there was a massive delay in showing the latest data. Onward. CodeRabbit: Free AI Code Reviews in VS Code - SponsorCodeRabbit brings real-time, AI-powered code reviews straight into VS Code, Cursor, and Windsurf. It lets you:
Reddit ArchitectureHere’s how Reddit works: 1. Post SubmissionThey added more app servers and set up a load balancer in front for scalability. Also they scaled the database vertically to handle extra I/O operations. Yet there’s a hardware limit with vertical scaling. So they partitioned the database to scale writes. But processing new posts is an expensive operation. Because it must update different tables, such as posts, subreddits, and accounts. Besides each submission goes through rate limit checks and spam filters. So they use a job queue. Here’s how it works:
Ready for the best part? 2. ListsThey show each subreddit and front page as an ordered list of posts. Here’s the query to fetch a list:
Yet querying the database often for an ordered list is expensive and adds extra latency. So they store the list of post IDs along with their rank on the cache server. Think of the cache as a denormalized index of posts. This makes it easy to look up the posts by primary key. Also they installed replica databases for read scalability. A single Reddit post is part of many lists. So they pre-compute each list using a job queue and store the results on the cache server for performance. Besides they store the popular lists in Cassandra for durability. New post submissions and voting will change the list order. But voting is an expensive operation. Because it must invalidate the cache and update different tables, such as posts, comments, and accounts. So they use a job queue. Here’s how voting works:
They update the cache through an atomic 'read-mutate-write' operation. Yet there’s a risk of a race condition when many processors handle votes on the same post at the same time. So they use locks. It ensures correctness when many processors access the same queue. They use Zookeeper for locking as it offers high availability. But the time spent waiting for locks skyrocketed during peak traffic. It means a user’s vote waits in the queue for hours before processing, thus affecting the user experience. While more processors mean extra lock contention. It happens because different processors try to acquire the lock for a popular post. And it further slows down the queue. So they partitioned the queue. And put votes into different queues based on the subreddit the voted post is in. They do a modulo N operation on the subreddit ID to find the correct queue. For example, if there are 3 queues, they do a modulo 3 on the subreddit ID. The processors then handle different queues. Thus fewer processors ask for the same lock at once. It allowed them to scale well with the same number of processors without waiting for locks. Let’s keep going! 3. CommentsPopular Reddit posts usually receive many comments with nested replies. Yet it’s expensive to find the metadata of every comment on a single post. For example, the parent-child relationship between comments. So they keep a denormalized list with metadata of the entire comment tree. Put another way, the database explicitly stores the tree's hierarchical relationships. It lets them easily find the relevant subset of the tree and look up only those comments. But updating the materialized tree structure with new or edited comments is expensive. It needs many row updates and hierarchy order maintenance. So they use a job queue. It handles changes in batches for efficiency. Yet a trending post might receive many comments in a short period. And it could make the queue slower because of resource hogging. So they assign separate queues for trending posts. It gives isolated processing capacity. Besides they set up a queue quota, which means each queue has a maximum length. Thus a single queue wouldn’t consume all resources. Although this might delay some comments, the system would stay functional. TL;DRThey use CDN to route requests based on the URL path and cookie information. Also it serves media files for low latency. They use a load balancer to isolate request paths. It routes requests to different services based on the request path. Put simply, a slow service wouldn't affect the front page. Thus enabling modular architecture and scalability.
Their traffic increases during the day and drops at night. So they set up an autoscaler to save money and react to changing demand automatically. It watches utilization metrics and adjusts capacity accordingly. While Reddit remains the 9th most visited site in the world. Subscribe to get simplified case studies delivered straight to your inbox: Want to advertise in this newsletter? 📰 If your company wants to reach a 156K+ tech audience, advertise with me. Thank you for supporting this newsletter. You are now 156,001+ readers strong, very close to 157k. Let’s try to get 157k readers by 10 July. Consider sharing this post with your friends and get rewards. Y’all are the best. TL;DR 🕰️ You can find a summary of this article here. Consider a repost if you find it helpful. References
Share this post & I'll send you some rewards for the referrals. |