MockMe.ai | AI-Powered System Design Interview Practice

1. The Question

Design a distributed Least-Recently-Used (LRU) cache that supports inserts and retrievals across multiple cache servers. The cache must be highly available and scalable; discuss API, data structures, sharding, replication, consistency trade-offs, failure handling, and operational concerns.

2. Clarifying Questions

Key clarifying questions to ask the interviewer:

Is the cache read-heavy or write-heavy? (Typical caches are read-heavy.)
Are keys fixed-size strings/IDs and are values blobs with size limits?
Should the cache provide strong consistency, read-your-writes, or eventual consistency?
Expected QPS, number of unique keys, and average object size (to estimate capacity)?
Multi-region availability requirements and acceptable staleness on failover?
Do we need TTLs, eviction metrics, or transactionality across multiple keys?

Assume: mostly read-heavy, keys are strings, values up to tens of KB, and the system needs high availability with relaxed consistency (configurable).

3. Requirements

Functional:

PUT(key, value): insert/update an item.
GET(key): retrieve an item; moves item to MRU.
DELETE(key): optional remove.
Eviction: global LRU per shard with capacity limits.

Non-functional:

High availability and low latency (ms).
Horizontal scalability (add/remove nodes with minimal disruption).
Configurable consistency (strong vs eventual trade-offs).
Monitoring and operational observability.

Constraints/assumptions:

Cache capacity per node is bounded.
Network partitions and node failures must be handled gracefully.
Acceptable to return slightly stale reads in return for availability unless strict consistency is requested.

4. Scale Estimates

Make assumptions to size the system:

QPS: 50k reads/sec and 5k writes/sec (example).
Unique keys: 200M hot keys overall; cold keys are served by datastore.
Average value size: 5 KB -> total hot working set ~ 1 TB.
Node capacity: 64 GB memory usable for cache per node -> ~13k items per node if 5 KB each, so ~80–100 nodes to hold working set.

These numbers drive sharding, replication factor, and network requirements.

5. Data Model

Key-value model:

Key: string (unique identifier)
Value: opaque blob (bytes) or typed object
Metadata stored per entry: size, lastAccessTs, version or sequence, optional TTL

On each cache node store:

In-memory hashmap: key -> pointer to node in LRU list and payload
Doubly-linked list nodes: maintain MRU at head and LRU at tail for O(1) eviction and move-to-front

Rationale: Hashmap + doubly-linked list gives O(1) get/put/evict operations for single-node LRU.

6. API Design

Simple HTTP/GRPC API between clients (or application library) and cache frontends:

PUT /cache/{key}
- Body: value
- Headers: x-cache-ttl (optional), x-write-mode (sync/async)
- Response: 200 OK or error
GET /cache/{key}
- Response: 200 + value OR 404 (miss)
DELETE /cache/{key}
- Response: 200 or 404

Client-side library responsibilities:

Map key -> shard (consistent hashing or range map)
Retry/timeout policies
Local config refresh from central config service
Optionally fallback to origin datastore on miss

Design notes:

Expose read preference headers (e.g., prefer-leader) if strong consistency required.
Support bulk GET/PUT for batching hot-key workloads.

7. High-Level Architecture

Components:

Clients / Application Servers: include a cache client library that knows shard mapping.
API Gateway / Load Balancer: optional for multi-tenant access.
Configuration Service: Zookeeper/Consul/Eureka storing shard mapping, cluster membership, and leader election.
Cache Shards: dedicated cache cluster sharded by key-space (consistent hashing or range-based).
- Each shard has a primary (leader) and N replicas for reads/high availability.
Origin Datastore: authoritative source for cache misses.
Monitoring & Metrics: eviction rates, hit/miss ratios, per-shard QPS, memory pressure.

Data flow:

Client computes target shard and sends GET/PUT to that shard's node (or LB).
Shard leader handles writes and coordinates replication to followers.
Reads can go to leader or followers depending on consistency settings.
Config service keeps clients updated about membership and shard boundaries.

8. Detailed Design Decisions

Sharding strategy:

Prefer consistent hashing to minimize key movement when nodes added/removed.
Use virtual nodes to balance load across heterogeneous machines.

Replication and consistency:

Replication factor (R) = e.g., 3 (1 leader + 2 followers).
Writes: sent to leader; choose write acknowledgement strategy:
- Acknowledge after leader write (low latency, eventual consistency)
- Acknowledge after majority replication (quorum) for stronger consistency
Reads: configurable
- Read-from-leader for strong consistency
- Read-from-follower for lower latency and higher throughput (accept eventual consistency)

Eviction (LRU) within shard:

Each shard maintains an in-memory hashmap + doubly-linked list for precise LRU.
On GET/PUT, move entry to head (MRU).
On capacity exceed, evict from tail (LRU).

Cross-node eviction and global LRU:

Global LRU is expensive; instead maintain per-shard LRU and size-based partitioning so each shard enforces capacity.
Optionally use dynamic load rebalancing (move hot keys or adjust virtual nodes) to avoid hotspots.

Failure handling & rebalancing:

Use configuration service (Zookeeper/Consul) for membership and leader election (or Raft-based clusters).
When a node dies: config service removes it, clients update mapping, keys previously on the node are remapped (consistent hashing) to other nodes.
Re-replication: background process copies missing replicas from leader or origin store.

Cache warm and cold starts:

On node addition, optionally prefetch hot keys from origin or replicate from existing replicas to avoid cold misses.

Operational concerns:

Monitor memory pressure and pause writes (backpressure) or evict more aggressively.
Expose metrics: hits, misses, evictions, latencies, replication lag.

9. Bottlenecks & Scaling

Potential bottlenecks and mitigations:

Hot keys: single-key hotspots can overload a shard; mitigate by key-splitting, request coalescing, or client-side sharding of large items.
Config service: single point of orchestration; run a highly available cluster (Raft/Zookeeper) with monitoring.
Rebalancing cost: moving many keys on scale-out can cause heavy network I/O; perform incremental migration and throttling.
Network bandwidth: replication and rebalancing consume bandwidth; use efficient serialization and chunking.
GC/heap pauses: prefer off-heap storage or languages/runtimes with predictable memory behavior; tune GC or use native memory caches.

Scaling directions:

Horizontal: add more shards (consistent hashing) and replicas.
Vertical: use larger memory nodes for hot partitions.
Multi-tier: combine local in-process LRU (per-app, tiny cache) + remote distributed cache for larger working set to reduce latency and load.

10. Follow-up Questions / Extensions

Possible extensions to discuss:

Provide TTL-based expiry alongside LRU and combined eviction policies.
Support multi-key atomic operations (compare-and-swap, transactions) and implications for consistency.
Hot-key mitigation strategies (client-side sharding, request batching, or adaptive replication).
Integrate with CDN or edge caches for geo-distribution.
Cache invalidation strategies for updates in the origin datastore.
Advanced routing: client-side consistent hashing vs centralized proxying and pros/cons.
Security: authentication, encryption in transit, and multi-tenant isolation.

11. Wrap-up

Summary:

Single-node LRU: implement with hashmap + doubly-linked list for O(1) ops.
Distributed: shard the keyspace (consistent hashing preferred), run shards with leader+replica topology, and use a config service for membership.
Trade-offs: choose replication/ack strategy to balance latency vs consistency; per-shard LRU avoids global coordination.
Operationally: plan for hot keys, rebalancing costs, monitoring, and capacity management.

This design provides a balanced solution that prioritizes availability and performance while offering configurable consistency guarantees.

Design a Distributed LRU Cache