Talks AWS re:Invent 2025 - Designing resilient Serverless Applications (API311) VIDEO
AWS re:Invent 2025 - Designing resilient Serverless Applications (API311) Designing Resilient Serverless Applications
Importance of Resiliency
Resiliency is crucial for serverless applications to avoid costly message loss and downtime
Financial and brand impact of unreliable systems highlighted by Gartner research
Shared responsibility model for resiliency - AWS manages infrastructure, developers responsible for application-level resiliency
Serverless Resiliency Challenges
Potential issues include service limits, misconfigurations, resource availability, and state management
Avoiding anti-patterns like excessive function chaining is key
AWS Well-Architected Framework and Serverless Lens
Well-Architected Framework's reliability pillar is the focus for resiliency
Serverless Lens provides deeper guidance on application-level resiliency best practices
Power Tools for AWS codifies these best practices across multiple languages
Serverless Resiliency Features
API Gateway
Throttling at account, stage, and method levels to regulate inbound traffic
Method-level caching and CloudFront integration for reducing backend load
Lambda
Synchronous vs asynchronous invocation models have different resiliency characteristics
Asynchronous invocations leverage internal queuing and retries for improved reliability
Event source mappings (e.g. SQS, DynamoDB Streams) provide additional resiliency options
EventBridge
Automatic retries with exponential backoff for event delivery
Ability to configure dead-letter queues and delivery windows
Power Tools for AWS
Provides utilities for batch processing, item deduplication, and other resiliency patterns
Simplifies implementation of best practices across languages
Resiliency Patterns
API Gateway to SQS/Step Functions
Bypassing Lambda to directly integrate API Gateway with queues or workflows
Decouples request handling from processing logic
Saga Pattern for Distributed Transactions
Leverages compensating actions to maintain consistency across multiple steps
Crucial for long-running, multi-service business processes
Validating Resiliency
Chaos Engineering with Fault Injection Service
Ability to inject failures, latency, and other disruptions to test system behavior
Helps identify weaknesses and validate recovery mechanisms
Key Takeaways
Every message and request is important - can't afford to lose the "million-dollar widget"
Carefully consider service interactions and assumptions when designing architectures
Leverage serverless resiliency features and patterns to build more reliable applications
Validate resiliency through chaos engineering and fault injection testing
Your Digital Journey deserves a great story. Build one with us.