System Design/misc/Design Facebook Messenger

Design Facebook Messenger

HARD12 minmessagingwebsocketspubsubnosqlscalabilitycdncaching
Asked at: Misc

High-level design for a real-time messaging system (one-to-one and group chat) covering protocols, architecture, data model, and scaling.

1. The Question

Design a real-time messaging system like Facebook Messenger that supports one-to-one and group chats, presence (online/offline), media (images/videos) uploads, read receipts, and push notifications. The system must be highly available, low-latency, and able to scale to millions of users.

2. Clarifying Questions

  • Targets: monthly active users, messages/day, messages/sec peak?
  • Consistency vs availability: can we tolerate eventual consistency for message delivery ordering or user status?
  • Retention: how long to keep message history?
  • Device model: multiple devices per user?
  • Third-party integrations: push providers, storage (S3)?

3. Requirements

Functional:

  • Real-time text messaging (1:1 & group).
  • Presence/online status.
  • Media uploads (images/videos) with CDN-backed delivery.
  • Read receipts and delivery acknowledgements.
  • Push notifications for offline users.

Non-functional:

  • Low latency (sub-200ms end-to-end ideally).
  • High availability and durability.
  • Horizontal scalability to millions of users and sustained high message throughput.
  • Secure delivery and access control so messages go only to intended recipients.

4. Scale Estimates

Example assumptions to drive design:

  • 50M monthly active users (MAU).
  • Peak concurrent connections: 5M daily concurrent users.
  • Average messages per active user per day: 20 -> 1B messages/day.
  • Peak writes: ~15k - 30k messages/sec depending on distribution.
  • Average message size (text): small; media occupies larger object storage and CDN bandwidth.

Use these to size servers, pub/sub throughput, DB shards, and CDN budgets.

5. Data Model

Key entities:

  • users: { id, username, display_name, last_active_timestamp }
  • conversations: { id, name?, type (dm/group), created_at }
  • conversation_users: { conversation_id, user_id, joined_at, role }
  • messages: { id, conversation_id, sender_id, timestamp, content_text, media_url, metadata (read/delivered) }

Storage decisions:

  • Use a wide-column / distributed NoSQL (Cassandra/HBase) for messages to enable high write throughput and easy sharding.
  • Store media in object storage (S3) and save URLs in messages.
  • Keep indexes or materialized views for per-user conversation lists (for fast lookups).

6. API Design

Core APIs (examples):

  • POST /v1/messages -> send message (websocket or REST fallback)
  • GET /v1/conversations -> list user's conversations
  • GET /v1/conversations/{id}/messages?limit=&before= -> fetch history
  • POST /v1/media -> get upload URL / upload media to object store
  • WebSocket endpoint: /v1/socket for real-time bi-directional messaging and presence updates

Behavior:

  • Clients maintain a persistent WebSocket. If unavailable, long-polling or polling can be fallback.
  • Messages sent over WebSocket are acknowledged by server; client may retry on failures.

7. High-Level Architecture

Components:

  • Load balancer / gateway -> routes incoming connections to stateless API/chat servers.
  • Chat/API servers -> maintain WebSocket connections, authenticate clients, publish/subscribe to message bus.
  • Message service / Pub/Sub (Kafka, Pulsar, or Redis Streams) -> broker messages between servers.
  • Storage DB (Cassandra) -> persist messages and conversation metadata.
  • Cache (Redis) -> hot message caches, recent conversation lists.
  • Object storage (S3) + CDN -> media storage and distribution.
  • Notification service -> sends push notifications (APNs/FCM) for offline users.
  • Monitoring, metrics, and tracing for observability.

Flow: client -> WebSocket -> chat server -> persist message -> publish to pub/sub -> subscribed chat servers push to connected recipients -> notification service for offline recipients.

8. Detailed Design Decisions

  • Protocol: WebSockets for low-latency bi-directional push. Fallback to long polling for unsupported clients.
  • Inter-server messaging: Durable pub/sub (Kafka/Pulsar) to fan-out messages to servers hosting recipient connections.
  • Database: Choose a partition-friendly NoSQL (Cassandra) prioritized for availability and write throughput; use time-based partition keys and message TTLs if needed.
  • Caching: Redis for recently-accessed conversations/messages to reduce DB reads.
  • Media: Pre-signed uploads to S3; serve via CDN to reduce latency and bandwidth cost.
  • Presence: Store lightweight presence in an in-memory store (Redis with key per user) and propagate via pub/sub for cross-server presence notifications.
  • Ordering & delivery: Per-conversation sequence numbers; server assigns message IDs and timestamps. Accept eventual consistency for ordering across datacenters with causal guarantees where needed.

9. Bottlenecks & Scaling

  • WebSocket connection limits: scale by sharding connections across many chat servers and load-balancing layer. Use horizontal autoscaling.
  • Pub/Sub throughput: provision partitions/topics to meet write/read throughput; use consumer groups per chat server pool.
  • DB write throughput: shard by conversation or user, add replicas; use batching for writes, and compaction tuning.
  • Hot conversations: cache hot groups and implement rate limits and throttling; use separate partitioning strategy for extremely large groups.
  • Cross-region latency: replicate messages asynchronously across regions; route users to nearest region and use push notifications for cross-region device sync.
  • Media bandwidth/cost: rely on CDN, use adaptive image/video quality and client-driven downloads.

Mitigations: partitioning, autoscaling, backpressure, batching, and monitoring with alerting.

10. Follow-up Questions / Extensions

  • End-to-end encryption: design E2E key exchange and storage without server access to plaintext (like Signal/Messenger Secret Conversations).
  • Message search: index messages (ElasticSearch) considering privacy and retention.
  • Message reactions, typing indicators, read receipts: additional events over WebSocket using same pub/sub.
  • Multi-device sync: ensure consistent read/delivery states across devices; design per-device ACKs and a centralized sync store.
  • Compliance & retention: support legal holds and user data export/delete workflows.

11. Wrap-up

Summary: Use WebSockets for real-time connectivity, a distributed pub/sub to route messages across stateless chat servers, a scalable NoSQL store for message persistence, Redis for caching/presence, and S3+CDN for media. Focus on partitioning, availability, and observability to achieve scale and reliability.

Ready to practice this question?

Run a mock system design interview with AI coaching and detailed feedback.