MockMe.ai | AI-Powered System Design Interview Practice

1. The Question

Design the set of signals/features and the system architecture to improve relevance of Meta's News Feed. The goal is to surface posts (from friends, creators, pages, ads mixed in) that increase meaningful conversations, engagement, and time spent while respecting freshness, diversity, and user satisfaction.

2. Clarifying Questions

Do we mean "signals" as input features fed into ranking models, or also system-level signals (e.g., freshness, delivery guarantees)?
Is the scope limited to the News Feed in the Facebook (blue) app, i.e., the main timeline users see after login?
How do we define "relevance": short-term engagement (clicks, likes), dwell time, conversation likelihood, or long-term retention and user satisfaction?
Are we optimizing a single metric (e.g., CTR) or multiple objectives (meaningful conversations, time spent, ad revenue, content diversity, safety)?
What latency budget must the ranking system meet for feed generation (e.g., 100ms, 300ms)?
What privacy constraints or feature restrictions (PII hashing, differential privacy) should signals respect?

Notes from context to consider:

User journey: News Feed is landing surface; users scroll or navigate away.
Primary goals: increase engagement/time spent and create meaningful conversations.
User segments: Gen Z (low usage), Millennials (declining engagement), Boomers (usability issues).
Candidate signals were suggested: login activity, engagement type, scroll velocity, time per post, friend connection signals, content signals, contextual signals (geo/device/connectivity).

3. Requirements

Functional requirements:

Rank posts for each user so the most relevant items surface near the top of the feed.
Support posts from friends, creators, pages, and ads, with correct mixing and blending.
Real-time incorporation of recent interactions (likes, comments, shares) and new posts.
Expose signals to offline training pipelines and online serving (feature store).

Non-functional requirements:

Latency: end-to-end feed generation for a single request should meet tight SLAs (e.g., 100–300ms for online ranking).
Scalability: support hundreds of millions to billions of users and billions of posts per day.
Freshness: new posts and user interactions should affect ranking within seconds to minutes.
Reliability and availability: 99.9%+ for core ranking service.
Privacy and safety: enforce privacy-preserving transformations and moderation signals.

Metrics / success criteria:

Immediate: CTR, like/comment/share rate, dwell time per post/session, session length, scroll depth.
Longer-term: retention, DAU/MAU trends, rate of meaningful conversations (e.g., comments that lead to replies), user-reported satisfaction.
Business constraints: balance between engagement metrics and ad revenue / content quality.

4. Scale Estimates

Target scale (orders of magnitude):

Users: hundreds of millions to low billions of users (global audience). Use capacity planning with peak DAU estimates.
Requests: 10^7–10^9 feed requests per day (depending on active users and sessions).
Posts: 10^8–10^9 posts created per day across all users; candidate corpus for a user can be millions of items if not filtered.
Interactions: 10^9+ interactions per day (likes, comments, shares, reactions).
Feature traffic: millions of feature reads/writes per second to a feature store and online stores.
ML model throughput: need to score thousands of candidates per request within latency budget; optimize candidate generation to reduce scoring load.

Assumptions should be validated against product telemetry; use percentiles for latency (p50/p95/p99).

5. Data Model

Core entities and representative fields (simplified):

User:
- user_id
- locale, age_range, device_type
- last_login_at, last_activity_at
- long_term_preferences (topics/tags)
- privacy_settings
Post (Content):
- post_id
- author_id (user/page/creator)
- created_at, updated_at
- content_type (video, image, text, link, poll)
- text_tags/topics, media_meta (duration, resolution)
- publisher_type (friend, creator, page, ad)
- moderation_flags
Interaction (event):
- event_id
- user_id, post_id
- event_type (view, click, like, comment, share, reaction)
- dwell_time_ms, scroll_position, session_id, timestamp
SocialGraphEdge:
- source_user_id, target_user_id, edge_type (friend, follow, blocked)
- affinity_score, last_interaction_at
FeatureRow (stored in feature store):
- entity_id (user_id or user_id:post_id)
- features: { k: value }
- timestamp

Indexes and stores:

User profile store: fast KV for user features.
Online feature store: low-latency store for precomputed features used in ranking.
Post store / catalog: metadata and content pointers (media stored in CDN).
Interaction event log (streaming): append-only for downstream transforms and offline training.

6. API Design

Representative APIs (HTTP/GRPC) exposed by the feed service:

GET /feed?user_id={uid}&session_id={sid}&cursor={cursor}&limit={n}
- Input: user_id, session context, optional location/time, device info, cursor
- Output: ranked list of {post_id, score, explanation, render_metadata}
- Constraints: must return within latency SLA; supports prefetching/caching.
POST /event
- Input: interaction events (view, click, reaction, dwell_time)
- Output: ack
- Usage: stream to event logging system and update online features.
GET /candidates?user_id={uid}&policy={policy}
- Input: used by experiments to fetch candidate pool from candidate generation.
- Output: list of candidate post_ids with metadata.
GET /features?entity={user_id}|{user_id:post_id}&names=[...]
- Input: list of feature names
- Output: latest feature values (for explainability/debugging)
Admin endpoints: /retrain, /push-model, /metrics, /experiment-toggle

Design considerations:

Use streaming for events to keep online stores updated asynchronously.
APIs should include request tracing ids for debugging and A/B test attribution.

7. High-Level Architecture

Key components and flow (high level):

Event Ingestion & Streaming
- Client events (views, interactions, post creates) -> Kafka/PubSub.
- Streams feed both offline pipelines and online feature updaters.
Offline Training
- Batch jobs consume events -> feature engineering -> train ranking models (candidate scorer, engagement predictors, dwell-time models).
- Outputs: model artifacts, feature specs.
Feature Store
- Offline feature store for training; online feature store (low-latency KV, e.g., Redis / specialized store) for serving.
- Feature materialization pipelines keep online store fresh.
Candidate Generation
- Multiple candidate sources: social graph pulls, creator subscriptions, personalized recommendation models, topic-based fetchers, trending assembler.
- Candidate filtering (policy, moderation, duplication, timeliness).
Scoring & Re-ranking
- Lightweight first-pass scoring (fast models) to reduce candidate set.
- Heavier re-ranker (GBDT/NN) computes final scores using online features.
- Multi-objective scoring combines signals for engagement, conversation likelihood, long-term retention.
Blending & Business Rules
- Blend posts, ads, and other surfaces according to policy.
- Enforce diversity (authors/topics), freshness, content safety, and experiment buckets.
Serving & Caching
- Serve final ranked feed via edge caches and application servers.
- Client prefetching and caching for scroll continuity.
Monitoring & Feedback Loop
- Real-time metrics, offline analysis, and model evaluation (A/B testing platform).

Diagram (textual):

Client -> Edge/API -> Candidate Gen -> Fast Scorer -> Online Feature Fetch -> Re-ranker -> Blending -> Client ^ | |<---------------- Event stream / Feature updates ---|

Notes:

Place heavy ML inference in dedicated GPU/TPU or optimized CPU clusters; use batching where possible.
Use feature caching and model distillation to meet latency constraints.

8. Detailed Design Decisions

A. Which signals to prioritize (high-level):

User-affinity signals: past interactions with author (likes, comments, shares), frequency and recency of interactions, relationship strength.
Engagement prediction signals: predicted probability of click, like, comment, share, and predicted dwell time.
Session signals: time since login, scroll velocity, session length, number of posts seen this session.
Content signals: content type (video, image, text), topic/tags, creator popularity, media quality, explicit calls-to-action.
Contextual signals: geo, local time of day, device type, connectivity, language, time zone.
Freshness / recency: post age, trending acceleration, breaking content indicator.
Social context signals: mutual friends reacting, number of friends who interacted, comment depth (conversation likelihood).
Diversity / novelty signals: entropy of author/topics in recent feed to avoid echo chambers.
Safety and moderation signals: violation likelihood, misinformation score.

B. Feature engineering and representation:

Use multi-timescale features: short-term (last session hour), medium (last week), long-term (last 6 months).
Session-based embeddings: derive short-term interest vectors from recent interactions.
Author affinity embedding: compressed vector for user-author relationship.
Use hashed/hashed-and-embedded categorical features to control feature explosion.

C. Model strategy:

Two-stage ranking: candidate generation (recall) -> scoring/re-ranking (precision).
Multi-task learning: predict multiple objectives (click, share, dwell) with shared backbone to leverage correlations.
Online learning: use streaming updates for certain parameters or use lightweight models that can be updated with recent interactions.

D. Handling cold start and sparsity:

For new users: rely on contextual signals, device locale, onboarding preferences, popularity signals.
For new posts: use content-based signals, creator reputation, topic-based similarity to known interests.

E. Privacy and policy:

Anonymize or aggregate sensitive features where possible; follow opt-out and data retention policies.
Provide explainability signals (why a post was shown) and controls for user preferences.

9. Bottlenecks & Scaling

Potential bottlenecks and mitigations:

Feature freshness vs latency:
- Problem: high-cardinality user:post features require low-latency updates.
- Mitigation: separate fast-path features (session, recent interactions) updated in-memory; materialize heavier features offline.
Candidate explosion:
- Problem: scoring millions of candidates per request is infeasible.
- Mitigation: aggressive candidate generation heuristics, multi-stage filtering with cheap first-pass models and deduping.
Model inference throughput:
- Problem: high QPS for re-ranker.
- Mitigation: batch inference, model quantization/distillation, caching of scores for short windows, serve lightweight models at edge.
Event ingestion and backpressure:
- Problem: spike traffic can overwhelm streaming system.
- Mitigation: autoscaling, partitioning, rate-limiting, graceful degradation (drop non-critical events).
A/B test variance and experiment contamination:
- Mitigation: deterministic bucketing, strong experiment telemetry, guardrails for negative metric drift.
Moderation & safety throughput:
- Use hybrid human + automated pipelines; prioritize posts with high engagement potential for faster review.
Bias and filter bubble:
- Introduce diversity constraints, tune exploration vs exploitation, monitor for echo chamber signals.

10. Follow-up Questions / Extensions

How to incorporate cross-platform signals (Instagram, WhatsApp) while respecting privacy boundaries?
How to jointly optimize feed ranking and ad allocation to balance user experience and revenue?
How to detect and reduce low-quality or sensational content that increases short-term engagement but harms long-term retention?
How to incorporate real-time signals like live events or breaking news that need immediate amplification?
How to support user controls for more/less of certain content types and use that as a signal?
Can we use causal methods to estimate long-term retention impact of ranking changes rather than short-term proxies?

11. Wrap-up

Focus on a mixture of social affinity, engagement-prediction, session/contextual, content, and freshness signals. Architect the system as a multi-stage ranking pipeline with robust feature stores, streaming ingestion, and low-latency online serving. Prioritize signal freshness for high-impact features, balance multiple objectives, and continuously monitor real-world metrics via experiments.

News Feed: Signals to Improve Relevance