TalksAWS re:Invent 2025 - Solving the Observability Mystery with AWS Step Functions (API321)

AWS re:Invent 2025 - Solving the Observability Mystery with AWS Step Functions (API321)

Solving the Observability Mystery with AWS Step Functions

Building a Scalable Wind Speed Analysis Workflow

  • Demonstrated building a serverless workflow using AWS Step Functions to analyze a large dataset of global wind speed data
  • Leveraged the Distributed Map state to efficiently process over 600,000 objects stored in S3
  • Configured the Distributed Map with:
    • Batching of 500 objects per iteration
    • Concurrency limit of 1,000 parallel executions
    • Tolerant failure threshold of 5%
    • Output to a separate S3 bucket
  • Included Lambda functions to:
    • Analyze the wind speed data and calculate the mean
    • Consolidate and generate the final output
    • Convert the wind speed units from knots to miles per hour

Observability and Monitoring for Step Functions

  • Discussed the importance of observability when working with asynchronous, distributed workflows
  • Highlighted the new metrics recently launched by AWS for Step Functions:
    • Open Map Run Limit: The maximum number of concurrent map runs allowed
    • Open Map Run Count: The current number of open map runs
    • Map Run Backlog Size: The number of map runs waiting to be executed
  • Demonstrated monitoring these metrics using Amazon CloudWatch and setting alarms to proactively identify issues
  • Explained how the metrics can help identify when the Step Functions service is throttling the workflow due to hitting the state transition limit

Debugging Cross-Account Integrations

  • Explored a scenario where a parent Step Functions workflow invokes a child workflow in a different AWS account
  • Highlighted the importance of establishing the correct trust relationship and IAM permissions between the accounts
  • Demonstrated how the parent workflow can get stuck waiting for the child workflow to complete due to a lack of "describe execution" permissions
  • Explained the backup polling mechanism used by Step Functions to handle cases where events are not delivered, and how this can cause delays
  • Recommended adding the "describe execution" and "stop execution" permissions to the child workflow's IAM role to enable the parent workflow to properly monitor and manage the child execution

Key Takeaways

  • AWS Step Functions provides powerful capabilities for building scalable, observability-focused serverless workflows
  • Monitoring the new Step Functions metrics, such as open map runs and state transition throttling, is crucial for proactive issue identification and resolution
  • Careful planning of cross-account integrations, including IAM permissions and trust relationships, is essential to ensure smooth workflow execution and observability

Technical Details

  • AWS Step Functions
  • Distributed Map state
  • Lambda functions
  • Amazon S3
  • Amazon CloudWatch metrics and alarms
  • IAM roles and permissions

Business Impact

  • Enables the processing and analysis of large, distributed datasets in a scalable, serverless manner
  • Provides deep visibility into the execution of complex, asynchronous workflows to quickly identify and resolve issues
  • Facilitates seamless integration between different AWS services and accounts, unlocking new opportunities for collaboration and reuse

Examples

  • Wind speed data analysis workflow processing over 600,000 objects
  • Monitoring Step Functions metrics to identify and address state transition throttling
  • Troubleshooting a cross-account integration issue caused by missing IAM permissions

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.