System Design
how to build systems that scale to millions of users.
what is system design
system design is the process of defining the architecture, components, and data flow of a system to satisfy given requirements. it’s about making the right tradeoffs between performance, scalability, reliability, and cost.
two types: - high level design (HLD) → architecture, components, how they connect - low level design (LLD) → classes, databases, APIs, data structures
key concepts before anything else
latency vs throughput
- latency → time to complete one operation (ms)
- throughput → operations per unit time (req/sec)
- goal: low latency AND high throughput
availability vs consistency
- availability → system is always responding
- consistency → all users see the same data at the same time
- you often can’t have both perfectly — this is the CAP theorem
scalability
- vertical scaling (scale up) → bigger machine, more RAM/CPU
- horizontal scaling (scale out) → more machines
CAP theorem
a distributed system can only guarantee 2 of these 3:
Consistency
/\
/ \
/ \
/ \
/________\
Availability Partition
Tolerance
- CA → consistent + available (single node, no partitions)
- CP → consistent + partition tolerant (MongoDB, HBase)
- AP → available + partition tolerant (Cassandra, DynamoDB)
in real distributed systems, network partitions WILL happen, so you choose between CP or AP.
ACID vs BASE
ACID (relational databases) - Atomicity → transaction is all or nothing - Consistency → data is always valid - Isolation → transactions don’t interfere - Durability → committed data survives crashes
BASE (NoSQL databases) - Basically Available → always responds - Soft state → data may change over time - Eventually consistent → all nodes will agree eventually
DNS (Domain Name System)
how a URL becomes an IP address:
user types google.com
↓
browser checks local cache
↓
OS checks /etc/hosts
↓
recursive DNS resolver (your ISP)
↓
root nameserver → .com nameserver → google's nameserver
↓
returns IP: 142.250.80.46
↓
browser connects to IP
DNS record types: - A → domain to IPv4
- AAAA → domain to IPv6 - CNAME → alias to
another domain - MX → mail server - TXT → text
(used for verification, SPF) - NS → nameserver for
domain
load balancing
distributes traffic across multiple servers so no single server gets overwhelmed.
┌─────────┐
│ Load │
users ─────────────▶│Balancer │
└────┬────┘
┌──────────┼──────────┐
▼ ▼ ▼
┌─────────┐┌─────────┐┌─────────┐
│Server 1 ││Server 2 ││Server 3 │
└─────────┘└─────────┘└─────────┘
algorithms: - round robin → requests go to each server in order - least connections → send to server with fewest connections - IP hash → same user always goes to same server - weighted round robin → servers get different proportions - random → random server each time
types: - L4 (transport layer) → routes based on IP/TCP, fast - L7 (application layer) → routes based on HTTP content, smarter
tools: Nginx, HAProxy, AWS ALB, Cloudflare
caching
storing data in fast storage (memory) to avoid slow operations (database, network).
request comes in
↓
check cache → HIT → return cached data (fast)
↓ MISS
fetch from database
↓
store in cache
↓
return data
cache strategies:
cache aside (lazy loading)
app checks cache
if miss → app fetches DB → app writes to cache
good for read-heavy workloads
write through
app writes to cache → cache writes to DB
data is always consistent, but write latency is higher
write back (write behind)
app writes to cache → cache writes to DB asynchronously
fast writes, risk of data loss if cache fails
read through
app reads from cache → if miss, cache fetches DB automatically
cache eviction policies: - LRU (Least Recently Used) → remove least recently accessed - LFU (Least Frequently Used) → remove least accessed overall - TTL → expire after time period - FIFO → remove oldest entry
where to cache: - browser cache - CDN - reverse proxy (Nginx) - application cache (in memory) - distributed cache (Redis, Memcached)
Redis vs Memcached: - Redis → persistent, data structures, pub/sub, more features - Memcached → simpler, pure caching, multi-threaded
CDN (Content Delivery Network)
network of servers around the world that serve static content from the nearest location to the user.
user in India
↓
CDN edge server in Mumbai (200ms away)
instead of
origin server in USA (300ms away)
what to put on CDN: - images, videos, audio - CSS, JavaScript files - HTML pages (static sites) - fonts
CDN providers: Cloudflare, AWS CloudFront, Fastly, Akamai
push vs pull CDN: - push → you upload files to CDN manually - pull → CDN fetches from origin on first request, caches it
databases
relational (SQL)
structured data, ACID, joins, schema required
-- good for: financial data, user accounts, inventory
-- PostgreSQL, MySQL, SQLite
-- normalized tables
users (id, name, email)
posts (id, user_id, title, content)NoSQL types
document (MongoDB)
{
"_id": "123",
"name": "abhishek",
"posts": [
{"title": "first post", "likes": 100}
]
}good for: flexible schema, nested data, content management
key-value (Redis, DynamoDB)
"user:123" → { name: "abhishek" }
"session:abc" → { userId: 123, expires: ... }
good for: caching, sessions, leaderboards, real-time features
column-family (Cassandra, HBase)
row_key → column_family → column → value
good for: time-series data, analytics, write-heavy workloads
graph (Neo4j)
(Abhishek) -[FOLLOWS]-> (Rahul)
(Abhishek) -[LIKES]-> (Post)
good for: social networks, recommendation engines, fraud detection
when to use what
| use case | database |
|---|---|
| user accounts, orders | PostgreSQL |
| caching, sessions | Redis |
| product catalog | MongoDB |
| social graph | Neo4j |
| analytics, logs | Cassandra |
| search | Elasticsearch |
database scaling
replication
copies of database on multiple machines
┌──────────┐
writes ───▶│ Master │
└────┬─────┘
│ replicates
┌────────┼────────┐
▼ ▼ ▼
┌────────┐┌────────┐┌────────┐
│Replica ││Replica ││Replica │
└────────┘└────────┘└────────┘
reads go to replicas
- master-slave → one write node, multiple read nodes
- multi-master → multiple write nodes (complex, conflict resolution needed)
sharding (partitioning)
split data across multiple databases
users with id 1-1000 → shard 1
users with id 1001-2000 → shard 2
users with id 2001-3000 → shard 3
shard strategies: - range based → by value range (easy but hot spots) - hash based → hash(key) % num_shards (even distribution) - directory based → lookup table for shard location
problems with sharding: - joins across shards are hard - re-sharding is painful - hot shards (one shard gets all traffic)
indexing
speeds up reads by creating a sorted data structure
-- without index: scan every row
SELECT * FROM users WHERE email = 'a@b.com';
-- with index: jump directly to row
CREATE INDEX idx_email ON users(email);- B-tree index → good for range queries, equality
- Hash index → only equality queries, very fast
- Composite index → multiple columns, order matters
- indexes slow down writes, speed up reads
message queues
decouple services so they don’t need to talk directly.
producer → [queue] → consumer
instead of:
service A calls service B directly (tight coupling)
use cases: - send email after signup (async) - process payment in background - handle traffic spikes (queue absorbs burst) - ensure message delivery even if consumer is down
tools: RabbitMQ, Apache Kafka, AWS SQS, Redis Streams
Kafka vs RabbitMQ: - Kafka → high throughput, event streaming, replay messages, log - RabbitMQ → traditional queue, complex routing, lower throughput
patterns: - pub/sub → one producer, many consumers get the message - point to point → one producer, one consumer gets the message - fan out → one message goes to multiple queues
microservices vs monolith
monolith
┌─────────────────────────────┐
│ Single Application │
│ ┌──────┐ ┌──────┐ ┌──────┐│
│ │Users ││Posts ││Notify ││
│ └──────┘ └──────┘ └──────┘│
│ Single Database │
└─────────────────────────────┘
pros: simple, easy to develop initially, easy to test cons: hard to scale specific parts, one bug can crash everything
microservices
┌────────┐ ┌────────┐ ┌────────┐
│ Users │ │ Posts │ │Notify │
│Service │ │Service │ │Service │
└────┬───┘ └────┬───┘ └────┬───┘
│ │ │
┌──┴──┐ ┌──┴──┐ ┌──┴──┐
│ DB │ │ DB │ │ DB │
└─────┘ └─────┘ └─────┘
pros: scale each service independently, independent deployments, different tech stacks cons: complex, network latency, harder to test, distributed system problems
when to use what: - start with monolith, split when you have clear boundaries and scale needs - microservices if you have large team, clear domain boundaries, different scaling needs
API design
REST
GET /users → get all users
GET /users/:id → get user by id
POST /users → create user
PUT /users/:id → replace user
PATCH /users/:id → update fields
DELETE /users/:id → delete user
GET /users/:id/posts → nested resource
HTTP status codes: - 200 OK, 201 Created, 204 No Content - 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found - 429 Too Many Requests, 500 Server Error, 503 Service Unavailable
GraphQL
client requests exactly what it needs
query {
user(id: "123") {
name
email
posts {
title
likes
}
}
}pros: no over/under fetching, one endpoint, strongly typed cons: complex caching, harder to optimize on server
gRPC
binary protocol, uses protocol buffers, very fast
service UserService {
rpc GetUser (UserRequest) returns (UserResponse);
}pros: very fast, strongly typed, streaming support cons: not human readable, harder to debug
when to use: - REST → public APIs, web apps, simple CRUD - GraphQL → complex data requirements, multiple clients - gRPC → internal microservices, low latency needed
rate limiting
prevent abuse and ensure fair usage
algorithms:
token bucket
bucket has N tokens
each request uses 1 token
tokens refill at rate R per second
if no tokens → reject request
allows bursts up to bucket size
leaky bucket
requests enter bucket at any rate
bucket leaks at constant rate
if full → reject
smooth output, no bursts
fixed window
count requests in current window (e.g. per minute)
if count > limit → reject
reset count each window
simple but has edge case at window boundaries
sliding window log
store timestamp of each request
count requests in last 60 seconds
accurate but memory intensive
where to implement: - API gateway - reverse proxy (Nginx) - application code - Redis (distributed rate limiting)
consistent hashing
used in distributed caches and load balancers to minimize re-distribution when nodes are added/removed.
servers and keys are placed on a "ring"
each key is served by the next server clockwise
ring: 0 ──────────────────────── 360
server A (at 90)
server B (at 180)
server C (at 270)
key1 hashes to 50 → served by server A (next clockwise)
key2 hashes to 140 → served by server B
key3 hashes to 220 → served by server C
when a server is added/removed, only keys on that segment are remapped, not all keys.
virtual nodes: each server has multiple points on the ring for better distribution.
real-time communication
polling
client asks server every N seconds: "any updates?"
simple but wasteful, high latency
long polling
client asks server, server holds connection open
server responds when data is available
client immediately makes new request
less wasteful, but still has latency
WebSockets
persistent bidirectional connection
server can push to client anytime
great for: chat, live feeds, games, notifications
Server-Sent Events (SSE)
one-way: server pushes to client
client can't send back on same connection
good for: notifications, live scores, stock prices
proxies
forward proxy
client → forward proxy → internet
hides client, used for: VPN, bypass restrictions, caching
reverse proxy
internet → reverse proxy → servers
hides servers, used for: load balancing, SSL termination, caching, security
API gateway
clients → API gateway → microservices
single entry point, handles: auth, rate limiting, routing, logging, protocol translation
tools: Nginx, Kong, AWS API Gateway
storage types
block storage raw storage, like a hard drive use for: databases, VMs, OS example: AWS EBS
file storage files in directories (NFS, SMB) use for: shared files, home directories example: AWS EFS
object storage stores objects with metadata and URL infinite scale, cheap use for: images, videos, backups, static files example: AWS S3, MinIO
numbers every engineer should know
| operation | latency |
|---|---|
| L1 cache reference | 1 ns |
| L2 cache reference | 4 ns |
| RAM access | 100 ns |
| SSD read | 100 μs |
| HDD seek | 10 ms |
| Same datacenter | 500 μs |
| Cross-continent | 150 ms |
scale intuition: - 1 million seconds ≈ 11.5 days - 1 billion seconds ≈ 31.7 years - 1 server handles ≈ 1000 req/sec - MySQL ≈ 1000 writes/sec, 10000 reads/sec - Redis ≈ 100,000 ops/sec - Kafka ≈ millions of messages/sec
system design interview framework
when asked to design a system, follow this order:
1. clarify requirements (5 min) - what features are needed? - how many users? (read/write ratio) - what scale? (requests per second, data size) - consistency or availability?
2. estimate scale (5 min) - daily active users - requests per second = DAU × requests/day / 86400 - storage = users × data per user × time - bandwidth = requests/sec × data per request
3. high level design (15 min) - draw main components: clients, servers, DB, cache, CDN - explain data flow
4. deep dive (20 min) - focus on most important parts - discuss tradeoffs - handle edge cases
5. wrap up (5 min) - bottlenecks - future improvements - monitoring
example: design URL shortener
requirements: - shorten long URL to short URL (bit.ly/xyz) - redirect short URL to long URL - 100M URLs/day, read heavy (100:1 read to write)
estimation: - writes: 100M/day = 1200/sec - reads: 12000/sec - storage: 100M × 500 bytes = 50GB/day
design:
client → API gateway → app servers → cache (Redis)
→ database (PostgreSQL)
short URL generation: - base62 encoding (a-z, A-Z, 0-9) = 62 chars - 7 characters = 62^7 = 3.5 trillion URLs - hash long URL → take first 7 chars - or: auto increment ID → base62 encode
database schema:
urls (
id bigint primary key,
short_code varchar(10) unique,
long_url text,
user_id bigint,
created_at timestamp,
expires_at timestamp
)caching: - cache short_code → long_url in Redis - 80/20 rule: 20% of URLs get 80% of traffic - cache those 20% (hot URLs)
=^._.^= design for failure. everything fails.