1. The Question
Design a YouTube-like service that lets users upload large videos (tens of GB), supports resumable uploads, and allows viewers to stream videos with low latency and adaptive bitrate. Focus on upload and streaming features, availability over consistency, and scale: ~1M uploads/day and ~100M views/day. Out of scope: content moderation, anti-bot systems, and monitoring/alerting.
2. Clarifying Questions
- Do we need live-streaming? (Assume no — only VOD.)
- Are we responsible for monetization / recommendations? (Out of scope; focus on upload/streaming.)
- Target latency for start of playback? (Aim for <2s to first frame in normal network conditions.)
- Do we require absolute consistency for metadata like view counts? (Prefer availability; eventual consistency OK.)
3. Requirements
Functional (core):
- Upload videos (support multi-GB, resumable uploads).
- Stream/watch videos (adaptive bitrate, low latency, wide device support).
Non-functional (core):
- Highly available (favor availability over strict consistency).
- Handle ~1M uploads/day and ~100M views/day.
- Support low-latency streaming in low-bandwidth env.
- Resumable uploads for unreliable networks.
Non-functional (out of scope): content moderation, bot protection, monitoring.
4. Scale Estimates
- Uploads: 1M videos/day ≈ 11.6 uploads/sec average; peak traffic maybe 10×, design for ~120 uploads/sec.
- Views: 100M views/day ≈ 1,157 views/sec average; peak factor 10 → ~12k views/sec; viral videos cause hotspots (millions of views/day each).
- Storage: assume avg video 100MB (conservative); 1M/day → 100TB/day raw; plus multiple transcoded formats and replicas → multiple PB/year.
- Bandwidth: heavy outbound egress; use CDN to offload origin and reduce latency.
5. Data Model
Primary entities (high-level):
- User { userId, name, createdAt }
- Video { videoId, uploaderId, title, description, status, createdAt, duration, thumbnails }
- VideoMetadata { videoId, formats: [{formatId, codec, container, resolution, bitrate, manifestUrl}], storageOriginUrl }
- UploadSession { uploadId, videoId, uploaderId, parts: [{partNumber, checksum, size, status}], createdAt }
Storage choices:
- Video metadata: Cassandra (partition by videoId) — leaderless, highly available, horizontally scalable.
- Blob storage: S3-like object store for original uploads, transcoded segments, manifests.
- Caching: edge CDN for segments and manifests; distributed cache (e.g., Redis cluster) for hot metadata.
6. API Design
Minimal APIs (evolving):
- POST /presigned_upload -> { videoMetadata } -> returns uploadSessionId and presigned URLs (multipart) for direct S3 uploads.
- GET /videos/{videoId} -> returns VideoMetadata (manifests, formats, thumbnails, status).
- POST /uploads/{uploadSessionId}/complete -> server validates parts and marks upload complete (triggers processing pipeline).
- GET /videos/{videoId}/stream -> (optional) returns primary manifest or redirects to CDN URL.
Notes: clients use multipart/resumable uploads directly to object store via presigned URLs. The backend stores UploadSession with parts and statuses so clients can resume.
7. High-Level Architecture
Components:
- Client (web/mobile): requests presigned URLs, uploads parts to object store, queries metadata, plays via manifest.
- API / Control Plane (stateless): issues presigned URLs, manages metadata, coordinates upload sessions, enqueues processing jobs.
- Object Store (S3): stores original uploads, transcoded segments, manifests.
- Transcoding & Post-processing (worker fleet + orchestrator like Temporal): split, transcode into multiple formats/resolutions, generate manifests (HLS/DASH), create thumbnails/subtitles.
- CDN / Edge Cache: caches manifests and segments globally to serve viewers with low latency.
- Metadata DB (Cassandra): stores video records, upload session state, pointers to manifests.
- Queueing system: decouples ingestion from processing, allows elastic scaling of workers.
Flow:
- Client requests presigned upload -> API creates UploadSession in DB and returns presigned URLs.
- Client uploads parts directly to S3 (resumable). S3 events or client notify API to mark parts uploaded.
- Client calls complete -> API verifies parts, marks video as uploaded, enqueues processing job.
- Processing DAG transcodes segments in parallel, writes manifests and segments to S3, updates metadata with manifest URLs and marks status "ready".
- Viewer requests video -> API returns metadata including CDN-accelerated manifest URL; client plays using adaptive streaming.
8. Detailed Design Decisions
Key decisions and rationale:
- Presigned multipart uploads -> avoid passing large blobs through app servers; enables resumable uploads and client-direct transfers.
- Use object store (S3) for all blob storage -> high durability, lifecycle policies, cheap storage tiers for older content.
- Transcoding pipeline as DAG (Temporal / orchestrator) -> parallelize CPU-bound work per segment; pass S3 URLs between steps.
- Store metadata in Cassandra -> high write throughput, horizontal scaling, eventual consistency; partition by videoId to enable point lookups.
- CDN for streaming -> reduce origin egress and latency; cache manifests & segments for adaptive bitrate.
- Adaptive bitrate (HLS/DASH) -> generate multiple renditions, segment durations ~2-6s, provide master manifest pointing to media manifests.
- Favor availability over consistency -> eventual consistency for metadata acceptable; use client-side retries and idempotency for APIs.
Operational considerations:
- Autoscale worker pool based on queue depth.
- TTL/lifecycle for original uploads or intermediate artifacts to reduce storage costs.
- Warm caches for trending content; prefetch popular manifests/segments into CDN.
9. Bottlenecks & Scaling
Potential bottlenecks and mitigations:
- Hot videos (read hotspot): mitigate with CDN, distributed cache, replicate metadata across nodes, and use aggressive TTL caching for metadata.
- Transcoding compute cost and latency: use highly parallel worker fleets, GPU/CPU optimized instances, and autoscale based on queue depth; consider on-demand vs. spot fleets.
- Upload spikes: scale presign API horizontally behind LB; rely on S3 for ingest; throttle or queue presign requests if needed.
- Large storage and cost: use storage tiers, compress with efficient codecs (VP9/AV1), and garbage-collect unused originals.
- Network egress: offload to CDN and use regional origins to reduce cross-region traffic.
- Consistency for session/parts: use idempotent part identifiers (checksums) and rely on object store multipart upload semantics for correctness.
10. Follow-up Questions / Extensions
- Support pipelined processing: client uploads pre-segmented chunks and server starts transcoding before entire original uploaded.
- Exact view counting and analytics: design a scalable counters system (approximate counters with periodic aggregation vs strong counters for billing).
- Live streaming: add ingest servers, lower-latency chunking, sub-second segmenting, and specialized CDN behaviors.
- Recommendations / personalization: build separate services with graph processing and feature stores (out of scope).
- Content moderation / copyright detection: integrate offline & nearline classifiers and human review pipelines.
11. Wrap-up
Design focuses on three pillars: reliable resumable uploads (presigned multipart to object store), scalable transcoding (DAG/orchestrator and parallel workers), and low-latency playback (multiple renditions + CDN). Use highly available components (S3, Cassandra, CDN) and elastic compute to meet peaks. Favor availability and eventual consistency for metadata, and design for caching and hotspot mitigation for viral content.