1. The Question
Design and implement a ranked cache system.
Primary goal: store key-value pairs with efficient reads/writes and evict items based on a rank metric (e.g., popularity/frequency/custom). Support Put(key,value,rank), Get(key), Evict(), UpdateRank(key,newRank). Consider both single-instance and distributed modes, with high throughput, low latency, and scalable design.
2. Clarifying Questions
- What does "rank" represent? (frequency, score from external system, TTL-weighted score)
- Is rank monotonic or can it decrease?
- Should rank be updated on every Get or batched?
- Capacity unit: number of items or bytes?
- Consistency model across replicas: eventual or strong?
- Required SLAs for latency (e.g., 1ms lookup)?
3. Requirements
Functional:
- Put(key, value, rank): insert/update entry with rank
- Get(key): return value if present
- Evict(): when full, remove lowest-ranked item
- UpdateRank(key, newRank): change rank efficiently
Non-functional:
- Low latency lookups (≈O(1))
- High throughput for reads/writes
- Scalable across nodes for large datasets
- Thread-safe concurrent operations
- Support optional TTL/expiration and metrics
4. Scale Estimates
- Single shard: 1M items, reads 50k/s, writes 5k/s as a baseline.
- Cluster: 100 shards → 100M total items, reads 5M/s.
- Memory per entry: key 50B, value avg 200B, metadata 32B → ~282B/entry.
- Network: expect cross-shard traffic only for routing/replication; item reads should be local to shard.
- Eviction rate depends on write arrival and capacity; design for O(log n) rank ops.
5. Data Model
Single-instance entry:
CacheEntry {
key: string,
value: bytes/object,
rank: float/int,
last_updated: timestamp,
ttl?: timestamp,
heap_ptr?: pointer
}
Primary structures:
- HashMap<Key, CacheEntry> for O(1) lookups
- Min-heap or balanced BST ordered by rank for eviction (min rank at root)
- Optionally a time-indexed heap for TTL expirations
For distributed mode: keys are sharded (consistent hashing) to per-shard instances that maintain local HashMap + ranking structure.
6. API Design
Public API (sync semantics):
- Put(key, value, rank): void | boolean (success)
- Get(key): value | null
- UpdateRank(key, newRank): boolean
- Delete(key): boolean
- GetStats(): {hits, misses, evictions, size}
Implementation notes:
- Put should return whether an eviction occurred.
- Provide async/bulk variants for batching writes/updates.
- Expose metrics and admin endpoints for cluster rebalancing.
7. High-Level Architecture
Single-instance:
- In-memory HashMap for key→entry.
- Min-heap or RB-tree keyed by (rank, tie-breaker) to identify lowest-ranked entry.
- Locking: fine-grained locks per entry or concurrent map + lock for ranking structure updates.
Distributed:
- Router / client-side consistent-hashing library maps keys to shards.
- Each shard: independent ranked cache instance with local capacity and eviction.
- Optional replication for fault tolerance (async replication to followers).
- Global aggregator (optional) to collect cross-shard stats and compute global ranks if needed.
Eviction policy: local min-rank eviction. For global uniformity, run periodic rebalancing/migration based on aggregated ranks.
8. Detailed Design Decisions
Choice of ranking structure:
- Min-heap: O(1) min lookup, O(log n) removal/insertion. Harder to remove arbitrary element without index.
- Balanced BST (e.g., TreeMap): O(log n) insert/remove and allows ordered iteration.
- Hybrid: Indexed heap or heap with handle (store heap index in CacheEntry) to allow O(log n) update.
Rank updates:
- Immediate update: Update heap/tree on every rank change → correct but higher CPU.
- Batched/delta updates: Buffer rank increments and apply periodically to amortize cost.
Concurrency:
- Use ConcurrentHashMap for entries. Protect ranking structure with a concurrent priority queue with handles or a sharded ranking structure to reduce contention.
Distributed consistency:
- Prefer eventual consistency for ranks across replicas; strong consistency incurs latency.
- For critical write-through use-cases, use primary-replica with synchronous write to primary and async to replicas (configurable).
9. Bottlenecks & Scaling
Potential bottlenecks:
- Ranking structure contention under high update rates (rank-churn). Mitigation: batch updates, sharded ranking, or approximate ranks (sketches).
- Hot keys causing disproportionate rank changes. Mitigation: local promotion thresholds, rate limiting updates.
- Memory pressure: evict aggressively or spill cold items to disk/secondary store.
- Network/mesh for global ranking: keep global aggregator async; avoid cross-shard sync on every access.
Scaling strategies:
- Shard by key (consistent hashing). Add/remove shards with rebalancing.
- Replicate shards for HA. Use quorum or leader/follower depending on consistency needs.
- Use TTL + lazy expiration to reclaim memory quickly.
10. Follow-up Questions / Extensions
- Support weighted rank combining frequency and recency (e.g., rank = alphafreq + betarecency).
- Support size-aware eviction (evict by lowest rank/byte ratio).
- Global ranking across shards: design aggregator and migration plan.
- Integrate with write-through to backing store with crash recovery.
- Allow per-key eviction policies (some keys pinned or protected).
11. Wrap-up
A ranked cache balances O(1) lookups via a HashMap with an O(log n) ranking structure for eviction and updates. For high throughput, reduce ranking contention by batching updates, sharding ranking state, or using handles in an indexed heap. Distributed deployments require sharding and optional replication; global rank consistency is possible but adds complexity and coordination overhead.