1. The Question
Design a real-time messaging system like Facebook Messenger that supports one-to-one and group chats, presence (online/offline), media (images/videos) uploads, read receipts, and push notifications. The system must be highly available, low-latency, and able to scale to millions of users.
2. Clarifying Questions
- Targets: monthly active users, messages/day, messages/sec peak?
- Consistency vs availability: can we tolerate eventual consistency for message delivery ordering or user status?
- Retention: how long to keep message history?
- Device model: multiple devices per user?
- Third-party integrations: push providers, storage (S3)?
3. Requirements
Functional:
- Real-time text messaging (1:1 & group).
- Presence/online status.
- Media uploads (images/videos) with CDN-backed delivery.
- Read receipts and delivery acknowledgements.
- Push notifications for offline users.
Non-functional:
- Low latency (sub-200ms end-to-end ideally).
- High availability and durability.
- Horizontal scalability to millions of users and sustained high message throughput.
- Secure delivery and access control so messages go only to intended recipients.
4. Scale Estimates
Example assumptions to drive design:
- 50M monthly active users (MAU).
- Peak concurrent connections: 5M daily concurrent users.
- Average messages per active user per day: 20 -> 1B messages/day.
- Peak writes: ~15k - 30k messages/sec depending on distribution.
- Average message size (text): small; media occupies larger object storage and CDN bandwidth.
Use these to size servers, pub/sub throughput, DB shards, and CDN budgets.
5. Data Model
Key entities:
- users: { id, username, display_name, last_active_timestamp }
- conversations: { id, name?, type (dm/group), created_at }
- conversation_users: { conversation_id, user_id, joined_at, role }
- messages: { id, conversation_id, sender_id, timestamp, content_text, media_url, metadata (read/delivered) }
Storage decisions:
- Use a wide-column / distributed NoSQL (Cassandra/HBase) for messages to enable high write throughput and easy sharding.
- Store media in object storage (S3) and save URLs in messages.
- Keep indexes or materialized views for per-user conversation lists (for fast lookups).
6. API Design
Core APIs (examples):
- POST /v1/messages -> send message (websocket or REST fallback)
- GET /v1/conversations -> list user's conversations
- GET /v1/conversations/{id}/messages?limit=&before= -> fetch history
- POST /v1/media -> get upload URL / upload media to object store
- WebSocket endpoint: /v1/socket for real-time bi-directional messaging and presence updates
Behavior:
- Clients maintain a persistent WebSocket. If unavailable, long-polling or polling can be fallback.
- Messages sent over WebSocket are acknowledged by server; client may retry on failures.
7. High-Level Architecture
Components:
- Load balancer / gateway -> routes incoming connections to stateless API/chat servers.
- Chat/API servers -> maintain WebSocket connections, authenticate clients, publish/subscribe to message bus.
- Message service / Pub/Sub (Kafka, Pulsar, or Redis Streams) -> broker messages between servers.
- Storage DB (Cassandra) -> persist messages and conversation metadata.
- Cache (Redis) -> hot message caches, recent conversation lists.
- Object storage (S3) + CDN -> media storage and distribution.
- Notification service -> sends push notifications (APNs/FCM) for offline users.
- Monitoring, metrics, and tracing for observability.
Flow: client -> WebSocket -> chat server -> persist message -> publish to pub/sub -> subscribed chat servers push to connected recipients -> notification service for offline recipients.
8. Detailed Design Decisions
- Protocol: WebSockets for low-latency bi-directional push. Fallback to long polling for unsupported clients.
- Inter-server messaging: Durable pub/sub (Kafka/Pulsar) to fan-out messages to servers hosting recipient connections.
- Database: Choose a partition-friendly NoSQL (Cassandra) prioritized for availability and write throughput; use time-based partition keys and message TTLs if needed.
- Caching: Redis for recently-accessed conversations/messages to reduce DB reads.
- Media: Pre-signed uploads to S3; serve via CDN to reduce latency and bandwidth cost.
- Presence: Store lightweight presence in an in-memory store (Redis with key per user) and propagate via pub/sub for cross-server presence notifications.
- Ordering & delivery: Per-conversation sequence numbers; server assigns message IDs and timestamps. Accept eventual consistency for ordering across datacenters with causal guarantees where needed.
9. Bottlenecks & Scaling
- WebSocket connection limits: scale by sharding connections across many chat servers and load-balancing layer. Use horizontal autoscaling.
- Pub/Sub throughput: provision partitions/topics to meet write/read throughput; use consumer groups per chat server pool.
- DB write throughput: shard by conversation or user, add replicas; use batching for writes, and compaction tuning.
- Hot conversations: cache hot groups and implement rate limits and throttling; use separate partitioning strategy for extremely large groups.
- Cross-region latency: replicate messages asynchronously across regions; route users to nearest region and use push notifications for cross-region device sync.
- Media bandwidth/cost: rely on CDN, use adaptive image/video quality and client-driven downloads.
Mitigations: partitioning, autoscaling, backpressure, batching, and monitoring with alerting.
10. Follow-up Questions / Extensions
- End-to-end encryption: design E2E key exchange and storage without server access to plaintext (like Signal/Messenger Secret Conversations).
- Message search: index messages (ElasticSearch) considering privacy and retention.
- Message reactions, typing indicators, read receipts: additional events over WebSocket using same pub/sub.
- Multi-device sync: ensure consistent read/delivery states across devices; design per-device ACKs and a centralized sync store.
- Compliance & retention: support legal holds and user data export/delete workflows.
11. Wrap-up
Summary: Use WebSockets for real-time connectivity, a distributed pub/sub to route messages across stateless chat servers, a scalable NoSQL store for message persistence, Redis for caching/presence, and S3+CDN for media. Focus on partitioning, availability, and observability to achieve scale and reliability.