Why Netflix, Instagram, and Twitter Pick Different Databases

Choosing a database isn’t about finding the “best” technology; it’s about matching a database’s strengths to your specific access patterns. In a recent architectural deep dive, the channel ByteMonk explained why three of the world’s largest platforms chose fundamentally different database paths. [00:23]


1. Netflix: High-Write Throughput with Cassandra

Netflix handles over 3 million writes per second as it tracks every pause, hover, and search from 260 million subscribers. [01:20]

  • The Choice: Apache Cassandra.
  • Why: Cassandra is built for horizontal write scaling. It acts like a distributed hashmap, routing writes to specific nodes with minimal overhead. [01:46]
  • The Trade-off: No joins or ad-hoc SQL queries. Netflix must model its data around specific queries rather than entities, often duplicating data across different tables to ensure every read is a simple key lookup. [02:27]

2. Instagram: Relational Complexity with PostgreSQL

Instagram’s core workload is read-heavy and highly relational. Feeds require joining posts with follow relationships, and profiles need aggregated counts. [03:35]

  • The Choice: PostgreSQL.
  • Why: PostgreSQL excels at joins, aggregations, and complex filtering. Instagram proved that you don’t need NoSQL just because you have a billion users; you can scale SQL using connection pooling (PG Bouncer), read replicas, and partitioning. [05:07]
  • The Trade-off: Massive write volumes are harder to handle than in Cassandra. Instagram accepts the engineering complexity of sharding and indexing to keep the flexibility of relational queries. [05:32]

3. Twitter: Ultra-Low Latency with Redis

Twitter’s challenge is the timeline. When you open the app, you expect to see a merged list of tweets from thousands of accounts instantly. [06:09]

  • The Choice: Redis (as a cache).
  • Why: Redis operates entirely in memory, serving precomputed timelines at 300,000 requests per second. Twitter uses a “Fan-out on Write” approach, pushing new tweets into the Redis caches of every follower so the timeline is already assembled when the user logs in. [07:01]
  • The Trade-off: Redis is not durable and can lose data on restart. Twitter uses it only as a cache, with a durable database (like Manhattan or MySQL) as the primary source of truth. [07:31]

How to Choose Your Database

To make the right choice, ask yourself three questions: [08:58]

  1. What is the access pattern? Relational queries (Postgres), massive writes (Cassandra), or ultra-low latency reads (Redis)?
  2. What are you willing to sacrifice? Flexibility, write scale, or data durability?
  3. Do you actually need it? Most apps don’t need a distributed NoSQL system; a well-indexed Postgres instance can handle 95% of use cases. [09:41]

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *