System Design Case Study: Designing a Scalable Notification Service

1. Overview

A scalable notification service is a critical backend component for modern applications, responsible for delivering timely, personalized messages to users across multiple channels (push notifications, email, SMS, in-app, WhatsApp, etc.). This system powers user engagement in products like Facebook, Instagram, Uber, Slack, Amazon, and banking apps.

The design must handle extreme fan-out (one event → millions of recipients), support real-time delivery where required, respect user preferences and Do-Not-Disturb settings, and provide exactly-once or at-least-once semantics with high reliability. Real-world inspirations include:

  • AWS SNS + SQS + Pinpoint
  • Firebase Cloud Messaging (FCM)
  • OneSignal / Braze
  • Facebook’s Iris/Taurus
  • Twitter’s Fanout service

The core challenge is the inverted traffic pattern: writes are low (one event), but reads/deliveries are extremely high (fan-out to N users).

2. Functional Requirements

  • Notification types:
    • Real-time (e.g., chat messages, ride updates)
    • Batch/digest (daily summaries)
    • Transactional (order confirmation)
    • Marketing/promotional
  • Channels:
    • Mobile push (APNs, FCM)
    • Email (SMTP/SES)
    • SMS (Twilio/Nexmo)
    • In-app inbox
    • Web push (VAPID)
    • WebSocket for live apps
  • Features:
    • Templating (Handlebars/liquid) with personalization
    • Localization (i18n)
    • User preferences (opt-in/out per channel/topic)
    • Do-Not-Disturb schedules
    • Priority levels (critical vs normal)
    • Tracking: delivery, open, click rates
    • Rate limiting per user/channel (avoid spamming)

3. Non-Functional Requirements

  • Scale: 1B+ users, 100M+ notifications/hour peak (e.g., Black Friday sales).
  • Latency:
    • Real-time: p99 < 500ms end-to-end
    • Non-critical: < 10s acceptable
  • Reliability: 99.999% delivery success, no lost notifications.
  • Availability: Survive data center outages.
  • Cost efficiency: Pay-per-delivery model in mind.

4. Capacity Estimation

  • Peak: 100M notifications/hour → ~28K notifications/sec
  • Fan-out ratio: average 1:1M recipients for marketing blasts → peak 28 billion deliveries/sec theoretically, but batched
  • Storage: 1B users × 100 bytes preferences → 100 GB
  • Event storage: 1 year retention → ~1 PB compressed
  • Queue throughput: Kafka at 100K–1M msg/sec easily sufficient

5. High-Level Architecture

6. Core Design Decisions

6.1 Fan-out Strategy

  • Fan-out on Write (used here): Expand recipients immediately → immediate consistency, but high write amplification during blasts.
    • Preferred for real-time and when recipient list is manageable.
  • Fan-out on Read: Store event once, users pull via WebSocket or polling → used by Facebook for news feed.

Hybrid: Real-time → fan-out on write; Marketing → store event + schedule batch job to fan-out.

6.2 User Preferences & Rate Limiting

  • Preferences stored in DynamoDB/Cassandra with TTL for temporary opt-outs.
  • On send: Check preferences → if blocked, drop silently or queue for later.
  • Per-user rate limit via distributed token bucket (Redis) to prevent spam.

6.3 Templating & Personalization

  • Templates in S3 or database with Handlebars/Mustache syntax.
  • Producer service renders template per user (or per segment for batch).

6.4 Delivery Reliability

  • Idempotency key per notification
  • Workers retry with exponential backoff (max 7 days for push)
  • APNs/FCM feedback loops for invalid tokens → update preferences
  • Dead Letter Queue for permanent failures

6.5 Real-Time Delivery

  • WebSocket connections sharded by user_id
  • On notification → route to user’s connected server via Redis Pub/Sub or Kafka

7. Detailed Flow Example: Ride Completion Notification (Uber-like)

  1. Ride service emits “ride.completed” event with {ride_id, user_id, driver_id}
  2. Producer receives → looks up templates + user preferences
  3. Preference check: User wants push + email, driver wants only push
  4. Fanout creates two notifications:
    • User: “Your ride with Alice has ended. Rate your experience!”
    • Driver: “Ride completed. Earnings: $23.50”
  5. Real-time queue → Push Workers → FCM/APNs → devices
  6. Tracking workers record delivery/open events → ClickHouse

8. Scaling & Optimization Summary

ComponentTechnologyScaling StrategyKey Optimization
Producer/FanoutStateless .NET/Java servicesAuto-scale on CPU/queue lagBatch preference lookups
QueueKafka (partitioned by user_id hash)Add brokersSeparate partitions for priority
WorkersKubernetes pods per channelHPA on queue depthConnection pooling to APNs/FCM
Preferences DBDynamoDB/CassandraSharded by user_idCache hot users in Redis
WebSocketSocket.io cluster + Redis adapterHorizontal podsSticky sessions via consistent hash

9. Performance in Practice (2025)

  • Instagram: >1B notifications/day
  • WhatsApp: 100B+ messages/day (similar architecture)
  • AWS SNS: 10M+ messages/sec sustained

This design achieves sub-second real-time delivery for critical notifications while efficiently handling massive marketing campaigns, making it the standard pattern used by every major consumer application in 2025.

Uma Mahesh
Uma Mahesh

Author is working as an Architect in a reputed software company. He is having nearly 21+ Years of experience in web development using Microsoft Technologies.

Articles: 288