System Design/microsoft/Log Ordered Messages System Design

Log Ordered Messages System Design

MEDIUM12 minloggingorderingkafkadistributed-systemsscalability
Asked at: Google, Microsoft

Design a scalable, reliable system to log messages while preserving their original order.

1. The Question

Design a system that accepts messages from producers and persists them such that the messages are logged in the exact order they were produced. The system should be durable, highly available, and scalable to large message volumes. Consider trade-offs between strict ordering, throughput, latency, and fault tolerance.

2. Clarifying Questions

  • Are messages single-stream (global ordering) or per-key ordering (e.g., per user/session)?
  • What is the expected write throughput (messages/sec) and size distribution?
  • Is ordering required across geographically distributed producers, or only per-region?
  • What are SLAs for write latency and durability (e.g., ack semantics)?
  • Are duplicates acceptable, or do we need exactly-once semantics?
  • How long must logs be retained and what query patterns are expected?

3. Requirements

Functional:

  • Accept messages from many producers and persist them.
  • Preserve ordering guarantees as specified (global or per-key).
  • Support read/query of logged messages.

Non-functional:

  • Durability: no logged message loss within retention window.
  • Availability: system tolerates node failures.
  • Scalability: handle spikes and growth in throughput.
  • Performance: bounded write latency; acceptable read latency for queries.
  • Security: authentication, authorization, encryption in transit and at rest.

4. Scale Estimates

  • Example baseline: 100k messages/sec, average message 1 KB → ~100 MB/s write throughput.
  • Peak might be 2–3x baseline.
  • Storage: 100 MB/s ≈ 8.6 TB/day. Factor retention policy to determine disk.
  • Number of partitions/servers sized to handle throughput and IOPS. Each partition target ~10k–20k msgs/sec depending on hardware.

5. Data Model

Use a simple append-only log record structure. Example JSON schema:

{
  "message_id": "uuid-v4",
  "producer_id": "producer-123",
  "timestamp": "2025-09-23T12:00:00Z",
  "key": "optional-key-for-partitioning",
  "payload": { ... },
  "metadata": { "schema_version": 1 }
}

Storage layout: append-only segments per partition/topic. Each record includes an offset (monotonic per partition) used to enforce order.

6. API Design

Producer API (HTTP/gRPC):

  • POST /v1/messages
    • Body: { key?, payload }
    • Response: { success: true, partition: 5, offset: 12345 }

Consumer / Query API:

  • GET /v1/messages?partition=5&offset=12345&limit=100
    • Returns ordered messages starting at offset.

Operational APIs:

  • POST /v1/topics
  • GET /v1/topics/{topic}/partitions

Notes: prefer gRPC for lower latency and streaming. Provide synchronous ack options:

  • ack=leader (low latency, at-least-once)
  • ack=replicated (wait for replication, stronger durability)
  • transactional writes for exactly-once across downstream systems.

7. High-Level Architecture

Recommended components:

  • Producers → Load Balancer → Ingest Gateway
  • Message Broker (Apache Kafka preferred)
    • Topics with partitions, replication, and durable segment storage
  • Logging Service / Consumers
    • Consume from broker, optionally enrich, and write to long-term store (S3, HDFS) or time-series DB
  • Metadata Service
    • Manage topics, partitions, schema registry
  • Monitoring & Ops
    • Prometheus, Grafana, alerting, autoscaling

Flow:

  1. Producer sends message to broker via ingest gateway.
  2. Broker appends to partition log; assigns offset.
  3. Consumers read partitions in offset order and persist to long-term storage or serve queries.

Rationale: Kafka provides durable, partitioned logs that preserve order per partition and support high throughput and replication.

8. Detailed Design Decisions

  • Ordering granularity: use per-key ordering by hashing key → partition. Global ordering requires single partition (limits throughput).
  • Broker choice: Kafka for scalable, replicated append-only logs that preserve order per partition.
  • Partitioning: choose partition key to group messages that must be ordered together.
  • Replication factor: >=3 for durability and leader failover.
  • Acks: use quorum acks (min.insync.replicas) to avoid data loss.
  • Exactly-once: enable Kafka transactions if consumers produce to downstream topics or sinks; otherwise document at-least-once behavior and deduplication strategy.
  • Consumer groups: single consumer per partition to maintain order; scale by adding partitions.
  • Long-term storage: offload older segments to object storage (S3) for cost-effective retention while keeping recent data hot in local storage.

9. Bottlenecks & Scaling

  • Ordering vs throughput: strict global ordering is a bottleneck; prefer per-key ordering.
  • Partition hot spots: popular keys create write hotspots — mitigate with key sharding or hierarchical keys.
  • Broker I/O and network: ensure sufficient disk throughput and network capacity; use SSDs for high throughput.
  • Consumer throughput: ensure consumers can keep up; use batching and parallel processing where ordering allows.
  • Leader failures: momentary stall while new leader is elected; tune replication and ISR settings.
  • Storage costs and retention: offload to cold storage and compact logs if appropriate.

Scaling strategies:

  • Add partitions to increase parallelism (note rebalancing cost).
  • Scale broker cluster horizontally and tune JVM/OS for I/O.
  • Use multiple regions with per-region ordering and cross-region replication when global ordering across regions isn't required.

10. Follow-up Questions / Extensions

  • How to support global ordering across regions? (Use single-region write or a global sequencer — trade-offs.)
  • How to provide low-latency reads while ensuring ordering? (Keep recent data in local fast store.)
  • Support for exactly-once delivery to sinks (use transactions and idempotent sinks).
  • How to handle schema evolution? (Use schema registry and versioning.)
  • Can we provide ordered queries across multiple keys? (Requires merge-join and time-based ordering; more complex.)

11. Wrap-up

A practical design uses a partitioned, replicated append-only log (e.g., Kafka) to preserve order at the partition level, combined with careful partitioning by key to meet ordering requirements while scaling throughput. Achieve durability with replication and quorum acks, mitigate hotspots with sharding, and offload older data to object storage. Trade-offs center on ordering granularity vs throughput and complexity of providing exactly-once semantics.

Ready to practice this question?

Run a mock system design interview with AI coaching and detailed feedback.