1. The Question
Design a URL shortening service (like TinyURL) that:
- Generates a short unique alias given a long URL.
- Redirects users from short links to the original URL with low latency.
- Supports optional custom aliases and expiration times.
- Provides analytics (click counts, geolocation, referrers) and REST APIs.
Non-functional goals: high availability, low latency redirects, unguessable keys, and horizontal scalability.
2. Clarifying Questions
- Is authentication required for creating links? (Assume optional; support API keys for rate limits.)
- Are short links case-sensitive? (Assume case-sensitive base62-style keys.)
- Max length for custom aliases? (Assume 16 chars.)
- Expected SLAs for redirect latency? (Aim for sub-100ms P95.)
- Retention policy for stored mappings and analytics? (Assume 5 years for mappings.)
3. Requirements
Functional:
- Create short link: given original_url (+ optional custom_alias, expire_date, user_id).
- Redirect: given short key, return HTTP 302 to original_url.
- Support custom aliases and expirations.
- Expose REST APIs for create/delete/list.
- Provide analytics (click counts, basic metadata).
Non-functional:
- Highly available and durable storage for mappings.
- Redirects must be low-latency and handle high read QPS.
- Short keys should be hard to guess.
- Rate limiting / abuse prevention per API key.
4. Scale Estimates
Assumptions (from prompt):
- 500M new shortenings / month => ~200 writes/sec.
- Read:write = 100:1 => ~20k redirects/sec.
- Store 500M/month * 5 years * 12 = 30B records.
- Avg object size ~500 bytes => ~15 TB storage.
- Incoming write bandwidth ~100 KB/s; outgoing read bandwidth ~10 MB/s.
- Caching 20% hot traffic => ~170 GB memory for cache.
Capacity goals:
- Handle 20k QPS of redirects with low latency.
- Support 200 QPS of creates and burst above that via queueing.
5. Data Model
Use a NoSQL wide-column or key-value store (Cassandra/DynamoDB/CockroachDB) for scale and availability.
Primary table: url_mapping (PK = short_key)
- short_key (string, PK)
- original_url (string)
- user_id (optional)
- created_at (timestamp)
- expire_at (timestamp)
- custom_alias (bool)
- metadata (JSON: title, tags)
- click_count (counter or stored in analytics store)
Secondary tables:
- user_urls: partition by user_id to list created links.
- analytics or events stream: store click events for aggregation (Kafka -> OLAP store).
Indexing: short_key must be primary lookup. For custom alias checks, ensure uniqueness constraint during creation.
6. API Design
REST APIs (examples):
POST /v1/shorten
- Request: { api_dev_key, original_url, custom_alias?, user_id?, expire_date? }
- Response: { short_url }
DELETE /v1/links/{short_key}
- Request: { api_dev_key }
- Response: { status }
GET /{short_key}
- Redirect endpoint: 302 -> original_url or 404/410 if not found/expired.
Additional APIs:
GET /v1/links/{user_id} - list user's links GET /v1/analytics/{short_key}?range=30d - aggregated stats
Rate limiting: enforce per-api_dev_key quotas on create and redirect calls.
7. High-Level Architecture
Components:
- Edge Load Balancers / CDN: terminate requests and route to application fleet.
- Application Servers: handle create and redirect logic.
- Cache Layer (Memcached/Redis): hot short_key -> original_url mapping for low-latency reads.
- Key Generation Service (KGS): offline/random key generator that seeds available keys.
- NoSQL Storage: primary durable store for mappings.
- Analytics Pipeline: click events emitted (Kafka) -> stream processors -> OLAP store for reports.
- Admin/Cleanup Worker: reclaims expired keys and optionally returns them to KGS.
- Auth/Quota Service: validate api_dev_key and enforce quotas.
Request flow (redirect):
Client -> LB -> App -> Cache (hit => redirect). On cache miss: App -> DB (lookup) -> update cache -> redirect.
Creation flow:
Client -> LB -> App -> check custom alias uniqueness in DB -> reserve/generate key via KGS or DB -> write mapping -> return short URL.
8. Detailed Design Decisions
Key generation strategies:
-
Hash-based encoding: hash(original_url + salt) then base62 encode. Pros: deterministic. Cons: collisions, predictability, duplicate inputs may produce same key.
-
KGS (recommended): pre-generate random keys (base62) and store unused pool. App servers request keys from KGS or read from key-DB. Pros: simple, unpredictable, avoids collisions at generation time. Cons: need to manage pool and ensure KGS HA.
Collision & concurrency:
- For custom aliases: perform an atomic insert with conditional 'if not exists' semantics in DB.
- For KGS-provided keys: mark as used atomically when consuming; KGS keeps in-memory batches for speed and moves them to used table immediately to avoid re-use.
Caching:
- Cache hot mappings in Redis/Memcached with TTL = min(expire_at - now, default_ttl).
- Use LRU eviction; replicate cache for read scale.
Analytics:
- Do not update click_count in main DB synchronously. Emit events to Kafka and update counters asynchronously to avoid write hot spots.
Expiration & cleanup:
- Lazy deletion on access if expired (return 410) and schedule background cleanup job to reclaim keys and optionally return to KGS.
9. Bottlenecks & Scaling
Potential bottlenecks and mitigations:
- Read traffic (redirects) => mitigate with large distributed cache layer, edge caching (CDN) and horizontally scaled app servers.
- Hot keys (a viral short URL) => use per-key rate limiters, shard analytics, and ensure DB partition for that key can handle throughput (use in-memory counters + async flush).
- KGS single point => run KGS in active-standby or partition key-space across multiple KGS instances; allow apps to cache batches.
- DB partition skew => use consistent hashing to distribute keys; re-shard with virtual nodes if needed.
- Analytics write pressure => use append-only event stream (Kafka) and stream processors for aggregation.
- Abuse (mass key consumption) => enforce api_dev_key quotas and CAPTCHAs for anonymous high-volume usage.
10. Follow-up Questions / Extensions
- Add custom domains per user (user.domain/abc) and routing rules.
- Support vanity URLs and access control (private links with auth tokens).
- Enforce phishing/malware detection on target URLs (scan on create).
- Multi-region deployment with geo-DNS and multi-master DB for regional failover.
- Short key rotation and deterministic re-mapping for GDPR deletion requests.
11. Wrap-up
A production-grade URL shortener requires careful choices around key generation, caching, and DB partitioning to support heavy read traffic with low latency. Use an offline/random KGS, aggressive caching at the edge, event-driven analytics, and strong quota controls to prevent abuse. Design for horizontal scale and incremental improvements (analytics, custom domains, security) after the core system is live.