1. The Question
Design a distributed Least-Recently-Used (LRU) cache that supports inserts and retrievals across multiple cache servers. The cache must be highly available and scalable; discuss API, data structures, sharding, replication, consistency trade-offs, failure handling, and operational concerns.
2. Clarifying Questions
Key clarifying questions to ask the interviewer:
- Is the cache read-heavy or write-heavy? (Typical caches are read-heavy.)
- Are keys fixed-size strings/IDs and are values blobs with size limits?
- Should the cache provide strong consistency, read-your-writes, or eventual consistency?
- Expected QPS, number of unique keys, and average object size (to estimate capacity)?
- Multi-region availability requirements and acceptable staleness on failover?
- Do we need TTLs, eviction metrics, or transactionality across multiple keys?
Assume: mostly read-heavy, keys are strings, values up to tens of KB, and the system needs high availability with relaxed consistency (configurable).
3. Requirements
Functional:
- PUT(key, value): insert/update an item.
- GET(key): retrieve an item; moves item to MRU.
- DELETE(key): optional remove.
- Eviction: global LRU per shard with capacity limits.
Non-functional:
- High availability and low latency (ms).
- Horizontal scalability (add/remove nodes with minimal disruption).
- Configurable consistency (strong vs eventual trade-offs).
- Monitoring and operational observability.
Constraints/assumptions:
- Cache capacity per node is bounded.
- Network partitions and node failures must be handled gracefully.
- Acceptable to return slightly stale reads in return for availability unless strict consistency is requested.
4. Scale Estimates
Make assumptions to size the system:
- QPS: 50k reads/sec and 5k writes/sec (example).
- Unique keys: 200M hot keys overall; cold keys are served by datastore.
- Average value size: 5 KB -> total hot working set ~ 1 TB.
- Node capacity: 64 GB memory usable for cache per node -> ~13k items per node if 5 KB each, so ~80–100 nodes to hold working set.
These numbers drive sharding, replication factor, and network requirements.
5. Data Model
Key-value model:
- Key: string (unique identifier)
- Value: opaque blob (bytes) or typed object
- Metadata stored per entry: size, lastAccessTs, version or sequence, optional TTL
On each cache node store:
- In-memory hashmap: key -> pointer to node in LRU list and payload
- Doubly-linked list nodes: maintain MRU at head and LRU at tail for O(1) eviction and move-to-front
Rationale: Hashmap + doubly-linked list gives O(1) get/put/evict operations for single-node LRU.
6. API Design
Simple HTTP/GRPC API between clients (or application library) and cache frontends:
-
PUT /cache/{key}
- Body: value
- Headers: x-cache-ttl (optional), x-write-mode (sync/async)
- Response: 200 OK or error
-
GET /cache/{key}
- Response: 200 + value OR 404 (miss)
-
DELETE /cache/{key}
- Response: 200 or 404
Client-side library responsibilities:
- Map key -> shard (consistent hashing or range map)
- Retry/timeout policies
- Local config refresh from central config service
- Optionally fallback to origin datastore on miss
Design notes:
- Expose read preference headers (e.g., prefer-leader) if strong consistency required.
- Support bulk GET/PUT for batching hot-key workloads.
7. High-Level Architecture
Components:
- Clients / Application Servers: include a cache client library that knows shard mapping.
- API Gateway / Load Balancer: optional for multi-tenant access.
- Configuration Service: Zookeeper/Consul/Eureka storing shard mapping, cluster membership, and leader election.
- Cache Shards: dedicated cache cluster sharded by key-space (consistent hashing or range-based).
- Each shard has a primary (leader) and N replicas for reads/high availability.
- Origin Datastore: authoritative source for cache misses.
- Monitoring & Metrics: eviction rates, hit/miss ratios, per-shard QPS, memory pressure.
Data flow:
- Client computes target shard and sends GET/PUT to that shard's node (or LB).
- Shard leader handles writes and coordinates replication to followers.
- Reads can go to leader or followers depending on consistency settings.
- Config service keeps clients updated about membership and shard boundaries.
8. Detailed Design Decisions
Sharding strategy:
- Prefer consistent hashing to minimize key movement when nodes added/removed.
- Use virtual nodes to balance load across heterogeneous machines.
Replication and consistency:
- Replication factor (R) = e.g., 3 (1 leader + 2 followers).
- Writes: sent to leader; choose write acknowledgement strategy:
- Acknowledge after leader write (low latency, eventual consistency)
- Acknowledge after majority replication (quorum) for stronger consistency
- Reads: configurable
- Read-from-leader for strong consistency
- Read-from-follower for lower latency and higher throughput (accept eventual consistency)
Eviction (LRU) within shard:
- Each shard maintains an in-memory hashmap + doubly-linked list for precise LRU.
- On GET/PUT, move entry to head (MRU).
- On capacity exceed, evict from tail (LRU).
Cross-node eviction and global LRU:
- Global LRU is expensive; instead maintain per-shard LRU and size-based partitioning so each shard enforces capacity.
- Optionally use dynamic load rebalancing (move hot keys or adjust virtual nodes) to avoid hotspots.
Failure handling & rebalancing:
- Use configuration service (Zookeeper/Consul) for membership and leader election (or Raft-based clusters).
- When a node dies: config service removes it, clients update mapping, keys previously on the node are remapped (consistent hashing) to other nodes.
- Re-replication: background process copies missing replicas from leader or origin store.
Cache warm and cold starts:
- On node addition, optionally prefetch hot keys from origin or replicate from existing replicas to avoid cold misses.
Operational concerns:
- Monitor memory pressure and pause writes (backpressure) or evict more aggressively.
- Expose metrics: hits, misses, evictions, latencies, replication lag.
9. Bottlenecks & Scaling
Potential bottlenecks and mitigations:
- Hot keys: single-key hotspots can overload a shard; mitigate by key-splitting, request coalescing, or client-side sharding of large items.
- Config service: single point of orchestration; run a highly available cluster (Raft/Zookeeper) with monitoring.
- Rebalancing cost: moving many keys on scale-out can cause heavy network I/O; perform incremental migration and throttling.
- Network bandwidth: replication and rebalancing consume bandwidth; use efficient serialization and chunking.
- GC/heap pauses: prefer off-heap storage or languages/runtimes with predictable memory behavior; tune GC or use native memory caches.
Scaling directions:
- Horizontal: add more shards (consistent hashing) and replicas.
- Vertical: use larger memory nodes for hot partitions.
- Multi-tier: combine local in-process LRU (per-app, tiny cache) + remote distributed cache for larger working set to reduce latency and load.
10. Follow-up Questions / Extensions
Possible extensions to discuss:
- Provide TTL-based expiry alongside LRU and combined eviction policies.
- Support multi-key atomic operations (compare-and-swap, transactions) and implications for consistency.
- Hot-key mitigation strategies (client-side sharding, request batching, or adaptive replication).
- Integrate with CDN or edge caches for geo-distribution.
- Cache invalidation strategies for updates in the origin datastore.
- Advanced routing: client-side consistent hashing vs centralized proxying and pros/cons.
- Security: authentication, encryption in transit, and multi-tenant isolation.
11. Wrap-up
Summary:
- Single-node LRU: implement with hashmap + doubly-linked list for O(1) ops.
- Distributed: shard the keyspace (consistent hashing preferred), run shards with leader+replica topology, and use a config service for membership.
- Trade-offs: choose replication/ack strategy to balance latency vs consistency; per-shard LRU avoids global coordination.
- Operationally: plan for hot keys, rebalancing costs, monitoring, and capacity management.
This design provides a balanced solution that prioritizes availability and performance while offering configurable consistency guarantees.