System Design/stripe/Design a Distributed LRU Cache

Design a Distributed LRU Cache

MEDIUM12 mincachedistributed-systemslrushardingconsistencyreplication
Asked at: Stripe

Design a distributed Least-Recently-Used cache with sharding, replication, eviction, and consistency trade-offs.

1. The Question

Design a distributed Least-Recently-Used (LRU) cache that supports inserts and retrievals across multiple cache servers. The cache must be highly available and scalable; discuss API, data structures, sharding, replication, consistency trade-offs, failure handling, and operational concerns.

2. Clarifying Questions

Key clarifying questions to ask the interviewer:

  • Is the cache read-heavy or write-heavy? (Typical caches are read-heavy.)
  • Are keys fixed-size strings/IDs and are values blobs with size limits?
  • Should the cache provide strong consistency, read-your-writes, or eventual consistency?
  • Expected QPS, number of unique keys, and average object size (to estimate capacity)?
  • Multi-region availability requirements and acceptable staleness on failover?
  • Do we need TTLs, eviction metrics, or transactionality across multiple keys?

Assume: mostly read-heavy, keys are strings, values up to tens of KB, and the system needs high availability with relaxed consistency (configurable).

3. Requirements

Functional:

  • PUT(key, value): insert/update an item.
  • GET(key): retrieve an item; moves item to MRU.
  • DELETE(key): optional remove.
  • Eviction: global LRU per shard with capacity limits.

Non-functional:

  • High availability and low latency (ms).
  • Horizontal scalability (add/remove nodes with minimal disruption).
  • Configurable consistency (strong vs eventual trade-offs).
  • Monitoring and operational observability.

Constraints/assumptions:

  • Cache capacity per node is bounded.
  • Network partitions and node failures must be handled gracefully.
  • Acceptable to return slightly stale reads in return for availability unless strict consistency is requested.

4. Scale Estimates

Make assumptions to size the system:

  • QPS: 50k reads/sec and 5k writes/sec (example).
  • Unique keys: 200M hot keys overall; cold keys are served by datastore.
  • Average value size: 5 KB -> total hot working set ~ 1 TB.
  • Node capacity: 64 GB memory usable for cache per node -> ~13k items per node if 5 KB each, so ~80–100 nodes to hold working set.

These numbers drive sharding, replication factor, and network requirements.

5. Data Model

Key-value model:

  • Key: string (unique identifier)
  • Value: opaque blob (bytes) or typed object
  • Metadata stored per entry: size, lastAccessTs, version or sequence, optional TTL

On each cache node store:

  • In-memory hashmap: key -> pointer to node in LRU list and payload
  • Doubly-linked list nodes: maintain MRU at head and LRU at tail for O(1) eviction and move-to-front

Rationale: Hashmap + doubly-linked list gives O(1) get/put/evict operations for single-node LRU.

6. API Design

Simple HTTP/GRPC API between clients (or application library) and cache frontends:

  • PUT /cache/{key}

    • Body: value
    • Headers: x-cache-ttl (optional), x-write-mode (sync/async)
    • Response: 200 OK or error
  • GET /cache/{key}

    • Response: 200 + value OR 404 (miss)
  • DELETE /cache/{key}

    • Response: 200 or 404

Client-side library responsibilities:

  • Map key -> shard (consistent hashing or range map)
  • Retry/timeout policies
  • Local config refresh from central config service
  • Optionally fallback to origin datastore on miss

Design notes:

  • Expose read preference headers (e.g., prefer-leader) if strong consistency required.
  • Support bulk GET/PUT for batching hot-key workloads.

7. High-Level Architecture

Components:

  • Clients / Application Servers: include a cache client library that knows shard mapping.
  • API Gateway / Load Balancer: optional for multi-tenant access.
  • Configuration Service: Zookeeper/Consul/Eureka storing shard mapping, cluster membership, and leader election.
  • Cache Shards: dedicated cache cluster sharded by key-space (consistent hashing or range-based).
    • Each shard has a primary (leader) and N replicas for reads/high availability.
  • Origin Datastore: authoritative source for cache misses.
  • Monitoring & Metrics: eviction rates, hit/miss ratios, per-shard QPS, memory pressure.

Data flow:

  • Client computes target shard and sends GET/PUT to that shard's node (or LB).
  • Shard leader handles writes and coordinates replication to followers.
  • Reads can go to leader or followers depending on consistency settings.
  • Config service keeps clients updated about membership and shard boundaries.

8. Detailed Design Decisions

Sharding strategy:

  • Prefer consistent hashing to minimize key movement when nodes added/removed.
  • Use virtual nodes to balance load across heterogeneous machines.

Replication and consistency:

  • Replication factor (R) = e.g., 3 (1 leader + 2 followers).
  • Writes: sent to leader; choose write acknowledgement strategy:
    • Acknowledge after leader write (low latency, eventual consistency)
    • Acknowledge after majority replication (quorum) for stronger consistency
  • Reads: configurable
    • Read-from-leader for strong consistency
    • Read-from-follower for lower latency and higher throughput (accept eventual consistency)

Eviction (LRU) within shard:

  • Each shard maintains an in-memory hashmap + doubly-linked list for precise LRU.
  • On GET/PUT, move entry to head (MRU).
  • On capacity exceed, evict from tail (LRU).

Cross-node eviction and global LRU:

  • Global LRU is expensive; instead maintain per-shard LRU and size-based partitioning so each shard enforces capacity.
  • Optionally use dynamic load rebalancing (move hot keys or adjust virtual nodes) to avoid hotspots.

Failure handling & rebalancing:

  • Use configuration service (Zookeeper/Consul) for membership and leader election (or Raft-based clusters).
  • When a node dies: config service removes it, clients update mapping, keys previously on the node are remapped (consistent hashing) to other nodes.
  • Re-replication: background process copies missing replicas from leader or origin store.

Cache warm and cold starts:

  • On node addition, optionally prefetch hot keys from origin or replicate from existing replicas to avoid cold misses.

Operational concerns:

  • Monitor memory pressure and pause writes (backpressure) or evict more aggressively.
  • Expose metrics: hits, misses, evictions, latencies, replication lag.

9. Bottlenecks & Scaling

Potential bottlenecks and mitigations:

  • Hot keys: single-key hotspots can overload a shard; mitigate by key-splitting, request coalescing, or client-side sharding of large items.
  • Config service: single point of orchestration; run a highly available cluster (Raft/Zookeeper) with monitoring.
  • Rebalancing cost: moving many keys on scale-out can cause heavy network I/O; perform incremental migration and throttling.
  • Network bandwidth: replication and rebalancing consume bandwidth; use efficient serialization and chunking.
  • GC/heap pauses: prefer off-heap storage or languages/runtimes with predictable memory behavior; tune GC or use native memory caches.

Scaling directions:

  • Horizontal: add more shards (consistent hashing) and replicas.
  • Vertical: use larger memory nodes for hot partitions.
  • Multi-tier: combine local in-process LRU (per-app, tiny cache) + remote distributed cache for larger working set to reduce latency and load.

10. Follow-up Questions / Extensions

Possible extensions to discuss:

  • Provide TTL-based expiry alongside LRU and combined eviction policies.
  • Support multi-key atomic operations (compare-and-swap, transactions) and implications for consistency.
  • Hot-key mitigation strategies (client-side sharding, request batching, or adaptive replication).
  • Integrate with CDN or edge caches for geo-distribution.
  • Cache invalidation strategies for updates in the origin datastore.
  • Advanced routing: client-side consistent hashing vs centralized proxying and pros/cons.
  • Security: authentication, encryption in transit, and multi-tenant isolation.

11. Wrap-up

Summary:

  • Single-node LRU: implement with hashmap + doubly-linked list for O(1) ops.
  • Distributed: shard the keyspace (consistent hashing preferred), run shards with leader+replica topology, and use a config service for membership.
  • Trade-offs: choose replication/ack strategy to balance latency vs consistency; per-shard LRU avoids global coordination.
  • Operationally: plan for hot keys, rebalancing costs, monitoring, and capacity management.

This design provides a balanced solution that prioritizes availability and performance while offering configurable consistency guarantees.

Ready to practice this question?

Run a mock system design interview with AI coaching and detailed feedback.