What If Your Marketing Stack Could "Think" in Real-Time?

I’ve spent a decent amount of time reviewing architectural diagrams that look exactly the same as they did five years ago. We’ve moved to the cloud, sure, but we’re still addicted to batch processing.

We still run cron jobs. We still poll databases. We still send marketing emails in giant, monolithic blasts.

But looking at the current capabilities of Amazon EventBridge, AWS Lambda, and Amazon Bedrock, it’s clear that the "Batch" era is becoming obsolete. The modern challenge isn't running campaigns; it's architecting systems that "react."

I call this pattern "Just-In-Time" Generative AI.

It is a move away from polling databases and toward an architecture that creates hyper-personalized content the millisecond a user interacts with an app. Here is the technical blueprint for how to build it correctly.

The Core Philosophy: Event-Driven over Cron-Driven

The biggest bottleneck in current personalization is latency. If a user abandons a cart, and your cron job runs 30 minutes later, the context is cold.

In this architecture, the frontend is dumb. It doesn't know about AI. It doesn't know about emails. It simply emits a state change.

The Trigger: We use Amazon EventBridge as the central nervous system. The schema design is critical here. We don't just send a generic "User Left" signal; we enforce a strict schema that carries the immediate context.

The Compute: The Orchestrator Pattern

An AWS Lambda function triggers off this event. But here is where the design diverges from standard "chatbot" tutorials.

For a production-grade "Just-In-Time" notification system, I do not recommend connecting this Lambda directly to a heavy Vector Database like Pinecone initially. Vector searches add latency and complexity that often isn't needed for transactional notifications.

Instead, the robust pattern here is "RAG-Lite":

Parallel Fetching: We use Node.js Promise.all() to fan out two lightweight requests:

User State: A direct GetItem from DynamoDB (Single Table Design) to grab hard preferences (e.g., CoffeeLover: true).
Environmental State: A fetch to a Weather API for the destination.

The goal is to assemble a context object in under 200ms.

The Brain: Model Selection & Latency Budgets

This is the most critical architectural decision. Most developers default to the smartest model available (like GPT-4 or Claude 3.5 Sonnet).

That is an anti-pattern for this use case.

If this system scales to 10,000 concurrent users, a 5-second inference time is unacceptable for a background worker, and it’s overkill for writing a 20-word notification.

The correct design choice is Claude 3 Haiku or Amazon Titan.

The Trade-off: We sacrifice complex reasoning (which we don't need) for raw speed and cost efficiency.
The Target: We aim for a P99 latency of <1.5 seconds for the generation.

The "Prompt as Code" Strategy

In this stack, the prompt isn't just text; it's a function. The System Prompt must enforce JSON output only.

Non-deterministic outputs are the enemy of stable architecture. If the LLM returns a paragraph of text, downstream code breaks. By forcing a JSON schema, we can validate the output programmatically before sending it to the user.

The System Prompt Structure:

System: You are a notification engine. Output JSON only.

Context:

Destination: Kyoto (Rainy, 12C)
User Interest: History, Coffee

Task: Write a push notification (max 15 words) connecting the weather to the interest.

Output Schema: { "title": string, "body": string, "sentiment": string }

The Infrastructure Advantage

The beauty of this blueprint is the infrastructure footprint.

At 3 AM (Zero Traffic): Infrastructure cost is $0. There are no EC2 instances idling. No containers waiting. The architecture effectively doesn't exist until an event occurs.
At 8 PM (Traffic Spike): If 5,000 users leave the app simultaneously, EventBridge buffers the load, and Lambda scales horizontally to handle the concurrent invocations.

The Reality Check: Guardrails

The code is the easy part. The hard part—the part that requires a true architect—is the Safety Layer.

For deployment, wrapping the Bedrock call in Amazon Bedrock Guardrails is mandatory. We need to prevent the AI from hallucinating discounts ("Get 50% off!") or making promises the business can't keep. We also implement a "Circuit Breaker" pattern: if the AI API times out, fallback to a static template. Never let the user see an error.

Conclusion

We have the tools to stop building "campaigns" and start building "reactions." The technology is ready; the constraint is simply our habit of thinking in batches.

This architecture—EventBridge, Lambda, Haiku—is the modern standard for engagement. It’s lean, it’s intelligent, and it’s built to scale.