Skip to content

System Design Top 10

1. Design a URL Shortener (TinyURL)

Description: Design a service like TinyURL that converts long URLs into short, unique aliases. When a user visits the short URL, they are redirected to the original long URL.

Solution

Requirements: * Functional: Shorten URL, Redirect, Custom alias (optional), Expiration (optional). * Non-Functional: High availability, Low latency, Read-heavy (100:1 ratio).

Key Concepts: * Hashing: MD5/SHA256 (too long) vs Base62 encoding. * Unique ID Generation: Auto-increment ID (predictable), UUID (too long), or Distributed ID Generator (Snowflake / KGS - Key Generation Service). * Database: Relational (MySQL) or NoSQL (DynamoDB/Cassandra) since data is simple (Key-Value).

Architecture: 1. Client sends create(long_url). 2. Web Server calls ID Generator (or uses KGS) to get a unique ID. 3. Convert ID to Base62 (e.g., ID 12345 -> tx9). 4. Store mapping <tx9, long_url> in DB and Cache. 5. Return http://tinyurl.com/tx9. 6. Redirect: Client visits short URL -> Server checks Cache/DB -> Returns HTTP 301 (Permanent) or 302 (Temporary) redirect.

2. Design a Rate Limiter

Description: Design a system that limits the number of requests a user can send to an API within a specified time window (e.g., 10 requests per second).

Solution

Requirements: * Accurate limiting, Low latency, Distributed support (works across multiple servers).

Algorithms: * Token Bucket: Tokens added at rate r. Request consumes token. (Common, handles bursts). * Leaky Bucket: Requests enter queue, processed at constant rate. (Smooths traffic). * Fixed Window Counter: Count reqs in [1:00, 1:01]. Edge case: spike at boundary. * Sliding Window Log: Store timestamps. Accurate but high memory. * Sliding Window Counter: Weighted average of previous and current window. (Balanced).

Architecture: * Middleware/Gateway: Rate limiter usually sits in API Gateway (Kong, Nginx). * Storage: Redis (fast, supports atomic INCR and Expiration). * Distributed: Use Redis Lua scripts to ensure atomicity or sticky sessions (not recommended).

3. Design a Key-Value Store (like DynamoDB/Cassandra)

Description: Design a distributed key-value store that supports put(key, value) and get(key). It must be highly available, scalable, and fault-tolerant.

Solution

Key Concepts: * CAP Theorem: Usually AP (Availability + Partition Tolerance) > CP (Consistency) for these systems (Eventual Consistency). * Data Partitioning: Consistent Hashing to distribute data across nodes evenly and minimize movement when nodes add/remove. * Replication: Replicate data to \(N\) nodes (e.g., \(N=3\)) for fault tolerance. * Consistency: Quorum Consensus (\(N, R, W\)). Strong consistency if \(R + W > N\). * Conflict Resolution: Vector Clocks or "Last Write Wins" (LWW). * Storage Engine: LSM Tree (Log-Structured Merge-tree) for fast writes (Cassandra) vs B+ Tree for reads. * Gossip Protocol: For detecting node failures and state sharing.

4. Design a Distributed Unique ID Generator (like Snowflake)

Description: Design a system to generate unique, numerical, 64-bit IDs in a distributed environment. IDs should be roughly sortable by time.

Solution

Approaches: * Multi-Master DB: auto_increment with step size (hard to scale). * UUID: 128-bit, not numeric, not sortable, index performance issues. * Twitter Snowflake (Recommended): * 1 bit: Sign bit (0). * 41 bits: Timestamp (milliseconds since epoch). Gives ~69 years. * 10 bits: Machine ID (5 bits Datacenter + 5 bits Worker). * 12 bits: Sequence number (resets every millisecond). Supports 4096 IDs/ms per machine.

Key Traits: * Time-ordered (due to timestamp). * Unique across distributed systems (due to Machine ID). * High throughput (no DB locks).

5. Design a Web Crawler

Description: Design a system that crawls the internet to index web pages.

Solution

Requirements: * Scale (billions of pages), Politeness (don't ddos), Extensibility, Robustness (handle malformed HTML).

Architecture: 1. Seed URLs: Starting point. 2. URL Frontier: Priority Queue (Redis/Kafka) managing URLs to download. Prioritizes fresh/important pages and enforces politeness (rate limits per domain). 3. DNS Resolver: Caches IP addresses. 4. HTML Downloader: Fetches page content. 5. Content Parser: Validates and parses HTML. 6. Duplicate Detection: * URL Filter: Bloom Filter to check if URL visited. * Content Dedup: Checksum (MD5) or SimHash for near-duplicate detection. 7. Data Storage: BigTable/HBase for content.

6. Design a Notification System

Description: Design a system to send millions of notifications (Push, SMS, Email) to users.

Solution

Requirements: * Reliability (don't lose messages), Deduplication, Rate limiting (don't spam), Priority (OTP > Marketing).

Architecture: * Service 1: Notification Service: Validates input, checks user preferences (Opt-in/out). * Message Queues (Kafka/RabbitMQ): Decouple sending from processing. Separate queues for different priorities/types (SMS Queue, Email Queue). * Workers: Consume from queues and call third-party providers. * Third-Party Services: APNS/FCM (Mobile), Twilio (SMS), SendGrid (Email). * Retry Mechanism: Exponential backoff for failed sends. * Rate Limiter: Prevent overwhelming users or 3rd party APIs.

7. Design a Chat System (WhatsApp/Telegram)

Description: Design a one-on-one and group chat application.

Solution

Requirements: * Real-time messaging, Online status, History, Multiple devices.

Key Concepts: * Protocol: WebSocket (bi-directional, persistent) is better than HTTP polling. * Connection Handling: Chat Server maintains WS connections. * Message Flow (1-on-1): User A -> LB -> Chat Server 1 -> Redis (find B's server) -> Chat Server 2 -> User B. * Storage: * NoSQL (Cassandra/HBase) for Chat History (Write heavy, access by chat_id + timestamp). * RDBMS for User Profile/Settings. * Unread/Push: If User B is offline, store in "Unread" DB and send Push Notification via Push Service.

8. Design a Social Media Feed (News Feed)

Description: Design the News Feed for Facebook/Instagram/Twitter. Users can post and see posts from friends/followees.

Solution

Approaches: 1. Pull Model (Fan-out on Load): * Write: User posts -> Save to DB. Fast. * Read: User views feed -> Query all followees' posts -> Merge & Sort. Slow read (\(N\) followees). Good for small scale. 2. Push Model (Fan-out on Write): * Write: User posts -> Push ID to every follower's "Feed Cache". Slow write (\(N\) followers). * Read: Read from "Feed Cache". Fast. Good for most users. 3. Hybrid (Recommended): * Push for normal users. * Pull for celebrities (Justin Bieber has 100M followers, pushing is too expensive).

Components: * Feed Service: Aggregates posts. * Cache (Redis): Stores pre-computed feed IDs. * DB: Stores actual post content.??? question "9. Design YouTube / Netflix (Video Streaming)"

Description: Design a video streaming platform where users upload and view videos.

Solution

Key Challenges: * Huge storage, Bandwidth, Latency, Various formats/resolutions.

Architecture: * Upload Flow: 1. Original video to S3 (Object Storage). 2. Trigger Transcoding Servers (via Queue). 3. Convert to multiple formats (HLS/DASH) and resolutions (360p, 1080p, 4k). * Streaming Flow: * CDN (Content Delivery Network): Cache videos at edge locations closer to users. Crucial for performance. * Adaptive Bitrate Streaming: Client detects bandwidth and requests appropriate chunk quality (e.g., drops to 360p if net is slow). * Metadata DB: SQL/NoSQL for video title, description, likes.

10. Design Google Drive / Dropbox

Description: Design a cloud file storage and synchronization service.

Solution

Key Concepts: * Block Storage: Don't upload whole file on change. Split file into blocks (e.g., 4MB). Only upload modified blocks (Delta Sync). * Compression & Encryption: Compress blocks before upload. * Deduplication: Hash blocks. If hash exists, don't upload again (save space).

Architecture: * Client: Watcher (monitors FS), Chunker (splits files), Syncer. * Block Server: Handles uploading blocks to S3/Cold Storage. * Metadata DB: Stores file structure (File Path -> List of Block hashes). Consistency is key (ACID desirable for metadata). * Notification Service: Long polling to notify other devices of changes.