System Design Case Study: Designing a Scalable Notification Service

1. Overview

A scalable notification service is a critical backend component for modern applications, responsible for delivering timely, personalized messages to users across multiple channels (push notifications, email, SMS, in-app, WhatsApp, etc.). This system powers user engagement in products like Facebook, Instagram, Uber, Slack, Amazon, and banking apps.

The design must handle extreme fan-out (one event → millions of recipients), support real-time delivery where required, respect user preferences and Do-Not-Disturb settings, and provide exactly-once or at-least-once semantics with high reliability. Real-world inspirations include:

AWS SNS + SQS + Pinpoint
Firebase Cloud Messaging (FCM)
OneSignal / Braze
Facebook’s Iris/Taurus
Twitter’s Fanout service

The core challenge is the inverted traffic pattern: writes are low (one event), but reads/deliveries are extremely high (fan-out to N users).

2. Functional Requirements

Notification types:
- Real-time (e.g., chat messages, ride updates)
- Batch/digest (daily summaries)
- Transactional (order confirmation)
- Marketing/promotional
Channels:
- Mobile push (APNs, FCM)
- Email (SMTP/SES)
- SMS (Twilio/Nexmo)
- In-app inbox
- Web push (VAPID)
- WebSocket for live apps
Features:
- Templating (Handlebars/liquid) with personalization
- Localization (i18n)
- User preferences (opt-in/out per channel/topic)
- Do-Not-Disturb schedules
- Priority levels (critical vs normal)
- Tracking: delivery, open, click rates
- Rate limiting per user/channel (avoid spamming)

3. Non-Functional Requirements

Scale: 1B+ users, 100M+ notifications/hour peak (e.g., Black Friday sales).
Latency:
- Real-time: p99 < 500ms end-to-end
- Non-critical: < 10s acceptable
Reliability: 99.999% delivery success, no lost notifications.
Availability: Survive data center outages.
Cost efficiency: Pay-per-delivery model in mind.

4. Capacity Estimation

Peak: 100M notifications/hour → ~28K notifications/sec
Fan-out ratio: average 1:1M recipients for marketing blasts → peak 28 billion deliveries/sec theoretically, but batched
Storage: 1B users × 100 bytes preferences → 100 GB
Event storage: 1 year retention → ~1 PB compressed
Queue throughput: Kafka at 100K–1M msg/sec easily sufficient

5. High-Level Architecture

6. Core Design Decisions

6.1 Fan-out Strategy

Fan-out on Write (used here): Expand recipients immediately → immediate consistency, but high write amplification during blasts.
- Preferred for real-time and when recipient list is manageable.
Fan-out on Read: Store event once, users pull via WebSocket or polling → used by Facebook for news feed.

Hybrid: Real-time → fan-out on write; Marketing → store event + schedule batch job to fan-out.

6.2 User Preferences & Rate Limiting

Preferences stored in DynamoDB/Cassandra with TTL for temporary opt-outs.
On send: Check preferences → if blocked, drop silently or queue for later.
Per-user rate limit via distributed token bucket (Redis) to prevent spam.

6.3 Templating & Personalization

Templates in S3 or database with Handlebars/Mustache syntax.
Producer service renders template per user (or per segment for batch).

6.4 Delivery Reliability

Idempotency key per notification
Workers retry with exponential backoff (max 7 days for push)
APNs/FCM feedback loops for invalid tokens → update preferences
Dead Letter Queue for permanent failures

6.5 Real-Time Delivery

WebSocket connections sharded by user_id
On notification → route to user’s connected server via Redis Pub/Sub or Kafka

7. Detailed Flow Example: Ride Completion Notification (Uber-like)

Ride service emits “ride.completed” event with {ride_id, user_id, driver_id}
Producer receives → looks up templates + user preferences
Preference check: User wants push + email, driver wants only push
Fanout creates two notifications:
- User: “Your ride with Alice has ended. Rate your experience!”
- Driver: “Ride completed. Earnings: $23.50”
Real-time queue → Push Workers → FCM/APNs → devices
Tracking workers record delivery/open events → ClickHouse

8. Scaling & Optimization Summary

Component	Technology	Scaling Strategy	Key Optimization
Producer/Fanout	Stateless .NET/Java services	Auto-scale on CPU/queue lag	Batch preference lookups
Queue	Kafka (partitioned by user_id hash)	Add brokers	Separate partitions for priority
Workers	Kubernetes pods per channel	HPA on queue depth	Connection pooling to APNs/FCM
Preferences DB	DynamoDB/Cassandra	Sharded by user_id	Cache hot users in Redis
WebSocket	Socket.io cluster + Redis adapter	Horizontal pods	Sticky sessions via consistent hash

9. Performance in Practice (2025)

Instagram: >1B notifications/day
WhatsApp: 100B+ messages/day (similar architecture)
AWS SNS: 10M+ messages/sec sustained

This design achieves sub-second real-time delivery for critical notifications while efficiently handling massive marketing campaigns, making it the standard pattern used by every major consumer application in 2025.