Building Serverless Chatbots with Amazon ElastiCache and Aurora PostgreSQL
Overview
The presentation discusses how to build scalable, high-performance serverless chatbots using Amazon ElastiCache and Aurora PostgreSQL.
The case study focuses on Flightly, a fictional travel platform that allows customers to search flights, book hotels, and plan vacations.
Flightly's Initial Architecture and Challenges
Flightly initially built an MVP with a simple chatbot architecture, but as user adoption grew, they faced performance issues.
The system was hitting database bottlenecks, with 30-second average response times for a single booking, leading to a 50% user abandonment rate.
This resulted in an estimated $50 million in annual lost revenue due to slow response times.
The Need for Caching and Semantic Search
To address the performance issues, Flightly decided to keep Aurora PostgreSQL as the source of truth and add a caching layer.
Caching frequently asked questions, baggage policies, and booking templates in Amazon ElastiCache can provide sub-millisecond response times.
Flightly also implemented semantic search using Aurora PostgreSQL's PGVector extension, which allows for vector-based embeddings and similarity searches.
This enables the chatbot to understand user intent beyond just keyword matching, providing more relevant and personalized responses.
Architectural Patterns
Context Cache: Storing user context (chat history, session state, preferences) in ElastiCache for quick retrieval.
Embedding Cache: Caching vector embeddings in ElastiCache to skip the expensive embedding generation process for repeat queries.
Durable Semantic Caching: Caching semantic search results in ElastiCache to avoid recomputing vector distances for similar queries.
Tiered Memory Management: Using ElastiCache for short-term memory (chat messages, session state) and Aurora PostgreSQL for long-term memory (episodic recall, user preferences).
Scaling to Production with Bedrock Agent Core
To scale the chatbot architecture to a million daily queries, Flightly leveraged Bedrock Agent Core, a fully managed platform for building and deploying AI agents.
Bedrock Agent Core provides a runtime, identity management, gateway, and observability layer to run agents securely at scale.
This allows Flightly to run their agents in a scalable, highly available, and secure cloud environment, without having to manage the underlying infrastructure.
Business Impact
By implementing the caching and semantic search patterns, Flightly was able to achieve sub-millisecond response times for frequently asked questions and sub-100ms response times for more complex queries.
This resulted in a 60% reduction in infrastructure costs and a 40% increase in customer retention, as users no longer abandoned the platform due to slow response times.
The agentic AI architecture enabled Flightly to expand beyond simple Q&A and build a more sophisticated chatbot that can handle booking flows, payment processing, and other complex tasks.
Key Takeaways
Caching is essential for building high-performance conversational AI systems, especially at scale.
Semantic search and vector embeddings can significantly improve the understanding of user intent beyond simple keyword matching.
A tiered memory architecture, with short-term memory in ElastiCache and long-term memory in Aurora PostgreSQL, can provide seamless context preservation and multi-agent workflows.
Leveraging a managed platform like Bedrock Agent Core can simplify the deployment and scaling of production-ready conversational AI systems.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.