System Design/misc/Design TikTok

Design TikTok

HARD12 minvideofeedcdnscalabilitycachedatabaseapi
Asked at: Misc

Design a scalable backend for a mobile short-video app supporting uploads, personalized feed, likes, follows, and high read throughput.

1. The Question

Design a backend system for a short-video mobile app (TikTok-like). The system should allow users to upload short videos (<= 1 min + caption), view a vertically-scrolling feed (personalized + follow feed), and interact with videos (likes, follows, comments). Focus primarily on backend architecture, data model, API design, and scaling to ~1M daily active users (DAU) with high read volume.

2. Clarifying Questions

  • Are we building mobile clients? (Assume client-agnostic REST/gRPC APIs.)
  • Feed type: follow-only or personalized? (Support both; core design will enable personalized recommendations via a precache service.)
  • Max video length? (Assume <= 1 minute compressed H.264.)
  • Which interactions matter? (Likes, follows, comments; basic share/forward optional.)

3. Requirements

Functional:

  • Upload video + caption
  • Serve a per-user feed (personalized + follow stream)
  • Like/follow/comment interactions
  • Preload top N videos for low startup latency

Non-functional:

  • High availability (target ~99.999%)
  • Low read latency for feed playback
  • Scale to ~1M DAU, burst to 10x
  • Efficient storage & CDN-backed delivery for large video blobs

4. Scale Estimates

  • Users: 1,000,000 DAU (assumption)
  • Video size: ~5 MB per 1-minute compressed video (H.264)
  • Uploads: assume 2 uploads/user/day => 10 MB/day/user
  • Daily video ingest: 1,000,000 * 10 MB = 10,000,000 MB = 10 TB/day
  • Monthly raw storage (30d): ~300 TB (before redundancy/replication/transcodes)
  • Read-heavy: feed reads >> writes; concurrency spikes possible (viral video)

5. Data Model

Use a relational DB for structured metadata + separate tables for activity. Example tables:

  • users(user_id PK, username, profile_meta, created_at)
  • videos(video_id PK, user_id FK, s3_url, thumbnail_url, caption, length_sec, codec_meta, created_at)
  • follows(follower_id, followee_id, created_at) -- indexed by follower_id
  • likes(user_id, video_id, created_at) -- append-heavy
  • comments(comment_id, video_id, user_id, text, created_at)
  • feed_cache(user_id, playlist[] or pointer, updated_at) -- for precached playlists

Store large binary blobs (video, thumbnails) in object storage (S3-compatible). Keep metadata in the DB. Use time-series / analytics store for metrics/logs.

6. API Design

Key endpoints (HTTP/REST or gRPC):

  • POST /upload/video

    • payload: video multipart or pre-signed URL upload; caption, user_id, metadata
    • flow: client requests presigned upload -> upload to object store -> notify metadata service -> persist video record
  • GET /feed?user_id={uid}&cursor={c}&limit={n}

    • returns ordered list of video metadata + CDN URLs (first N preloaded)
  • POST /video/{id}/like

    • body: user_id; writes to likes table + activity log
  • POST /user/{id}/follow

    • body: follower_id, followee_id
  • GET /user/{id}/activity (likes/follows)

Notes: Use presigned PUT to offload video upload bandwidth from API servers. Keep metadata write path lightweight.

7. High-Level Architecture

Components:

  • Mobile client
  • API gateway / load balancer -> API service fleet (stateless) behind LB
  • Auth service (token issuance/validation)
  • Object storage (S3) for raw videos + transcoded variants
  • CDN (Akamai/CloudFront) in front of object storage for low-latency video delivery
  • Relational primary DB for metadata (write master) + read replicas
  • Redis or Memcached for per-user cache and hot-object caching
  • Precache service / feed generation workers that build personalized playlists and populate Redis
  • Message queue (Kafka/SQS) for async tasks: transcode, analytics, feed updates, notifications
  • Transcoding service to generate multiple bitrates and thumbnails
  • Read replicas / regionized deployments and sharding layer

Flow highlights:

  • Upload: client -> presigned S3 PUT -> notify API -> persist metadata -> enqueue transcode -> push to CDN
  • Feed fetch: client -> API -> read Redis precache -> return metadata & CDN URLs; client fetches blobs from CDN

8. Detailed Design Decisions

  • Metadata DB: Relational (Postgres) for structured relationships (users->videos->comments). Use read replicas to separate reads from writes.
  • Blob storage: S3-compatible object store for large binary; cheap, durable, integrated with CDN.
  • Caching & precache: Precompute personalized playlists into Redis per user (top N videos). Reduces on-the-fly compute and DB load.
  • CDN: Critical to handle viral spikes and reduce origin bandwidth. Cache TTLs depend on content immutability.
  • Upload flow: Use presigned URLs so API servers don't host video uploads.
  • Transcoding: Async pipeline consuming uploads from queue; store variants and update metadata when ready.
  • Sharding: Shard write DB by user_id or by region to distribute write load at scale.
  • Consistency: Eventual consistency acceptable for feeds; likes/follows must be persisted but can propagate to caches asynchronously.

9. Bottlenecks & Scaling

  • Video delivery: origin bandwidth and latency — mitigate with CDN + multiple edge POPs.
  • DB write throughput: high ingest of metadata and activity; mitigate with sharding, write scaling, and partitioning.
  • Feed generation: expensive personalization at scale; mitigate by precaching, incremental updates, and approximate algorithms.
  • Hot objects (viral videos): cache hotspots on CDN and edge caches; use rate-limiting and request collapsing at origin.
  • Transcoding pipeline: scale horizontally; use autoscaling workers and spot instances for cost efficiency.
  • Cache invalidation: ensure user actions (like, follow) update precache or are merged at read time; use TTLs and async invalidation.

10. Follow-up Questions / Extensions

  • Recommendation engine: offline batch + online scoring; features from user interactions, video embeddings, collaborative filtering.
  • Personalization freshness: how to balance new uploads appearing in feed vs. stability of feed
  • Moderation: automated content moderation (ML) + human review; policy for removed content and cache invalidation
  • Multi-region deployment: geo-routing, data residency, replication
  • Analytics & metrics: realtime dashboards, A/B testing, retention tracking
  • Cost optimization: long-term cold storage, TTLs for inactive videos, transcode on-demand for rarely viewed bitrates

11. Wrap-up

Design focuses on decoupling large binary delivery (object store + CDN) from metadata (relational DB), using caching and precache services to handle heavy read traffic and low-latency feed delivery, and asynchronous pipelines for upload/transcode/analytics. Critical scaling levers: caching, CDNs, DB sharding, read replicas, and precomputed personalized feeds.

Ready to practice this question?

Run a mock system design interview with AI coaching and detailed feedback.