TalksAWS re:Invent 2025 - Designing resilient Serverless Applications (API311)

AWS re:Invent 2025 - Designing resilient Serverless Applications (API311)

Designing Resilient Serverless Applications

Importance of Resiliency

  • Resiliency is crucial for serverless applications to avoid costly message loss and downtime
  • Financial and brand impact of unreliable systems highlighted by Gartner research
  • Shared responsibility model for resiliency - AWS manages infrastructure, developers responsible for application-level resiliency

Serverless Resiliency Challenges

  • Potential issues include service limits, misconfigurations, resource availability, and state management
  • Avoiding anti-patterns like excessive function chaining is key

AWS Well-Architected Framework and Serverless Lens

  • Well-Architected Framework's reliability pillar is the focus for resiliency
  • Serverless Lens provides deeper guidance on application-level resiliency best practices
  • Power Tools for AWS codifies these best practices across multiple languages

Serverless Resiliency Features

API Gateway

  • Throttling at account, stage, and method levels to regulate inbound traffic
  • Method-level caching and CloudFront integration for reducing backend load

Lambda

  • Synchronous vs asynchronous invocation models have different resiliency characteristics
  • Asynchronous invocations leverage internal queuing and retries for improved reliability
  • Event source mappings (e.g. SQS, DynamoDB Streams) provide additional resiliency options

EventBridge

  • Automatic retries with exponential backoff for event delivery
  • Ability to configure dead-letter queues and delivery windows

Power Tools for AWS

  • Provides utilities for batch processing, item deduplication, and other resiliency patterns
  • Simplifies implementation of best practices across languages

Resiliency Patterns

API Gateway to SQS/Step Functions

  • Bypassing Lambda to directly integrate API Gateway with queues or workflows
  • Decouples request handling from processing logic

Saga Pattern for Distributed Transactions

  • Leverages compensating actions to maintain consistency across multiple steps
  • Crucial for long-running, multi-service business processes

Validating Resiliency

Chaos Engineering with Fault Injection Service

  • Ability to inject failures, latency, and other disruptions to test system behavior
  • Helps identify weaknesses and validate recovery mechanisms

Key Takeaways

  • Every message and request is important - can't afford to lose the "million-dollar widget"
  • Carefully consider service interactions and assumptions when designing architectures
  • Leverage serverless resiliency features and patterns to build more reliable applications
  • Validate resiliency through chaos engineering and fault injection testing

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.