Skip to main content
LU
Back to portfolio

Case study

Scaling an omnichannel messaging platform to 100M+ messages a month

SMS/MMS/RCS for 19 retail banners — fast, compliant, and observable

Client
Publicis Sapient / Albertsons Companies
Role
Technical Lead
Period
Jan 2022 – Present

Albertsons Companies runs 19 grocery and pharmacy retail banners, and all of them reach their customers by text with marketing: promotions, offers, and campaign messages. My team owns the unified marketing messaging platform behind that — over 100 million messages a month.

Impact at a glance

Outbound sends (<80 → 500 TPS)
Inbound processing (500 → 5,000 TPS)
10×
Messages per month
100M+
Customer reachability with RCS
+30%

The challenge

When I joined as technical lead, the platform was moving fewer than 80 messages per second in each direction — both outbound sends to the carrier through Sinch, and the inbound stream coming back from Sinch (delivery receipts and customer replies). That was nowhere near enough for peak campaigns across 19 banners, and with delivery data trailing far behind the sends, campaign tracking was anything but real-time.

On top of the throughput ceiling, TCPA compliance (the US law governing text-messaging consent and quiet hours) was not enforced by the system itself, leaving a Fortune 500 company exposed to class-action risk measured in millions. And customers were drifting toward richer channels the platform couldn't speak at all, like RCS. Every fix had to land on a live platform with zero tolerance for delivery outages.

What I did

I attacked it in three production releases. First, the outbound path: I refactored the SMS/MMS send pipeline to Sinch — Spring WebFlux and Kafka — fixing the performance issues that capped it, and took sustained outbound throughput from under 80 to 500 transactions per second: a 6× improvement, shipped to production on its own.

Second, reach and compliance together: I designed the RCS channel end to end — Sinch integration, Google RCS Business Agents, and a Command Pattern dispatch layer that routes each message to the best channel available. The TCPA campaign checks were designed into the send path itself, validating on every message that the recipient has opted in and that the send falls inside the allowed time windows. Violations went from operational risk to structurally impossible, and rich messaging lifted customer reachability by 30%.

Third, the inbound flow back from Sinch — delivery receipts and customer replies. The platform could land 1,200 messages per second on the Kafka topic, but the services behind it processed only about 500. I redesigned ingestion and processing with Java 21 Virtual Threads so the platform now receives 5,000+ messages per second on the topic and processes them at nearly the same rate — a 10× processing improvement that turned campaign tracking into a near-real-time view.

Throughout, we instrumented everything with a Grafana + Loki observability stack, so delivery health is visible in real time across all 19 banners.

Results

  • Outbound sends scaled from under 80 to a sustained 500 TPS for SMS/MMS (6×) — released to production first
  • Inbound flow re-architected: 5,000+ TPS received on the topic and processed at nearly the same rate, up from 1,200 received / 500 processed — campaign tracking became near-real-time
  • TCPA checks at send time: opt-in status and allowed time windows validated on every message — millions in class-action exposure eliminated
  • +30% customer reachability via the new end-to-end RCS channel
  • 100M+ messages per month across 19 retail banners, observable in real time with Grafana + Loki

Tech stack

  • Java 21
  • Spring WebFlux
  • Kafka
  • MongoDB
  • Sinch
  • RCS
  • Grafana
  • Loki
  • TCPA
  • GCP