System Design/misc/Design Instagram (Feed & Photo Upload)

Design Instagram (Feed & Photo Upload)

MEDIUM10 minfeedphotosscalabilitycdndatabasecaches3
Asked at: Misc

Design a scalable Instagram-like service supporting photo upload, follows, and a user feed for ~10M monthly users.

1. The Question

Design an Instagram-like service that supports: uploading images from mobile clients, following other users, and generating a news feed of images. Target scale: ~10 million monthly active users. Keep reliability and reasonable latency in mind.

2. Clarifying Questions

  • Do we need comments/likes/notifications initially? (Assume no, can be added later.)
  • Should images be public by default or support privacy? (Assume public + private account option.)
  • Target latency for feed loads (e.g., <500ms for feed metadata; images can load progressively).
  • Supported clients: mobile native (iOS/Android) and web.
  • Are we optimizing for cost or for maximum throughput? (Balance both; design for horizontal scale.)

3. Requirements

  • Functional
    • Upload photos (mobile/web).
    • Follow/unfollow users.
    • Generate a personal news feed showing recent photos from followed users.
  • Non-functional
    • Target scale: 10M monthly active users.
    • Read-heavy workload (many feed reads vs fewer uploads).
    • Reasonable freshness: feed should reflect new posts within seconds-to-minutes.
    • High availability and durability for photos.

4. Scale Estimates

Assumptions: 10M monthly active users, 2 photo uploads/user/month, avg photo size 5 MB (including metadata). Calculations:

  • Uploads/month = 10M * 2 = 20M photos
  • Storage/month = 20M * 5 MB = 100M MB = ~100 TB/month
  • Annual storage ~1.2 PB (before dedup/retention policies)
  • Read/write ratio: assume reads >> writes (e.g., 100:1). Peak QPS depends on DAU and usage patterns; plan capacity for millions of reads per minute with horizontal scaling.

5. Data Model

Relational metadata DB (Postgres/MySQL) for relational queries; object store for blobs. Tables (simplified):

  • users

    • id (PK)
    • username
    • email
    • display_name
    • created_at
    • privacy_settings
  • photos

    • id (PK)
    • user_id (FK -> users.id)
    • caption
    • s3_path (or object store URL)
    • width, height
    • created_at
    • location
  • follows

    • follower_id (FK -> users.id)
    • followee_id (FK -> users.id)
    • created_at
    • primary key (follower_id, followee_id)

Notes:

  • Store large binary photo data in object storage (S3/GCS) and only keep references in relational DB.
  • Use indices on photos.user_id and follows.follower_id for fast lookup.

6. API Design

Key endpoints (REST-style):

  • POST /api/v1/photos
    • Upload metadata; obtain presigned URL or upload directly to object storage.
  • GET /api/v1/users/{id}/photos
    • List photos for a user (paginated).
  • POST /api/v1/users/{id}/follow
    • Follow/unfollow operations.
  • GET /api/v1/feed
    • Get current user's feed (paginated).

Implementation notes:

  • Use presigned URLs for direct client -> object storage uploads to reduce server bandwidth.
  • Pagination via cursor-based paging.
  • Authenticate requests with tokens; validate permissions for private accounts.

7. High-Level Architecture

Components:

  • Clients: mobile apps / web.
  • CDN: caches images and static assets at edge.
  • Load Balancer / API Gateway: routes requests to appropriate services.
  • App Servers: split responsibilities into read-serving layer and write/upload layer (can start as a monolith then split).
  • Cache: Redis or Memcached for hot reads, feed cache, and user/session data.
  • Metadata DB: primary relational DB with read replicas.
  • Object Storage: S3/GCS for photo blobs, replicated and durable.
  • Feed Generation Service: precomputes feeds (push/fan-out or pull/fan-in tradeoffs) and stores results in cache.
  • Background Workers / Queues: process uploads, image transcoding, cache invalidation, feed computation.

Data flow:

  • Upload: client obtains presigned URL, uploads to object store -> app server writes metadata to DB -> enqueue feed update to workers -> update cache.
  • Read feed: client requests feed -> app server checks cache -> if miss, returns precomputed feed or composes feed from DB and caches result -> client loads images via CDN using object URLs.

8. Detailed Design Decisions

  • Database choice: relational DB to model users/photos/follows and support joins (e.g., Postgres). Use read replicas for scale.
  • Object store: S3 for large binary storage; store multiple image sizes/variants.
  • Caching: Redis for sessions, user profile cache, and feed cache. Use write-through or explicit cache invalidation on uploads.
  • Feed strategy: precompute feeds (fan-out) for users with moderate follower counts; for celebrities with many followers, use hybrid approach (fan-out to active followers, otherwise compute on read).
  • Image serving: generate multiple resolutions at upload (worker) and serve via CDN to reduce latency and bandwidth.
  • Upload pattern: presigned URLs to allow clients to upload directly to object store and reduce server CPU/bandwidth costs.
  • Consistency: eventual consistency acceptable for feed freshness; strong consistency for follow/unfollow operations where necessary.

9. Bottlenecks & Scaling

  • Storage growth: use lifecycle policies, tiering (glacier/nearline) and retention policies.
  • Read QPS: mitigate with CDN, cache, and read replicas.
  • Hot users (celebrities): fan-out to millions is expensive. Use hybrid fan-out / fan-in, rate-limited background jobs, and special handling for very large follower sets.
  • Feed generation compute: distribute with background workers; shard by user ID; use queues (Kafka/RabbitMQ).
  • DB write throughput: partitioning/sharding users across multiple DB instances when single DB can't handle load.
  • Network bandwidth: use presigned uploads and serve photos from CDN to reduce origin bandwidth.
  • Image processing: autoscale worker pool for transcoding; use efficient libraries and small worker VMs.

10. Follow-up Questions / Extensions

  • Add likes, comments, and notifications: model additional tables and design asynchronous notification pipelines.
  • Explore page and personalized recommendations: add ranking service, ML models, and offline feature pipelines.
  • Stories/Reels (ephemeral content, video): add video storage, streaming infra, and time-limited content lifecycle.
  • Search: build inverted indexes, use ElasticSearch or similar.
  • Rate limits, abuse detection, and moderation: integrate abuse detection, content moderation workflows, and user reporting.
  • Multi-region deployment and geo-replication for lower latency and compliance.

11. Wrap-up

Summary: Use a relational DB for metadata, object storage for photos, CDN for serving images, and Redis for caching. Split read and write services, precompute feeds where possible, and use background workers for heavy processing. Plan for read-heavy scaling, handle hot users with hybrid feed strategies, and optimize uploads with presigned URLs.

Ready to practice this question?

Run a mock system design interview with AI coaching and detailed feedback.