TalksAWS re:Invent 2025 - Best practices to simplify resilience at scale for Gen AI data & apps (STG317)

AWS re:Invent 2025 - Best practices to simplify resilience at scale for Gen AI data & apps (STG317)

Simplifying Resilience at Scale for Gen AI Data & Apps

Importance of Resilience in the Cloud

  • 52% of data leaders surveyed by AWS said their data foundations are not ready for AI implementation
  • Common resilience challenges in the cloud:
    • Accidental deletion of critical data
    • Software issues causing service disruptions
    • Malicious attacks like ransomware

Resilience for NoSQL Databases (DynamoDB)

  • DynamoDB is a popular choice for storing user profile and watch list data due to its scalability and partitioning
  • However, DynamoDB is vulnerable to issues like:
    • Canary deployments corrupting partitions
    • Lack of a single known good recovery point across partitions
  • Recovery process without Clumio is complex and time-consuming:
    • Requires restoring each impacted partition individually
    • Involves cherry-picking data and reconfiguring applications
  • Clumio Backtrack for DynamoDB simplifies recovery:
    • Allows recovery to any point-in-time with in-place restoration
    • No need to reconfigure applications or create temporary tables

Resilience for LLM-Powered Chatbots (S3 & Vectors)

  • Chatbot functionality relies on movie data stored in S3 and vector embeddings
  • Loss of S3 data can render the vector store useless, causing chatbot failures
  • Recovery without Clumio is complex:
    • Requires full S3 bucket restore
    • Needs to recompute vectors and reconfigure the LLM
  • Clumio Backtrack for S3 enables simple recovery:
    • Granular recovery of only impacted objects
    • No need to recompute vectors or reconfigure the LLM

Resilience for Data Lakehouse (Apache Iceberg on S3)

  • Movie insights feature uses an Apache Iceberg data lakehouse on S3
  • Iceberg data is vulnerable to schema changes and data overwrites
  • Recovery without Clumio is challenging:
    • Requires restoring S3 data and rebuilding the Iceberg table structure
    • Needs to reconfigure applications and dashboards
  • Clumio Backtrack for Iceberg provides seamless recovery:
    • Preserves the Iceberg table structure during backup and restore
    • Supports converting between Glue and S3 Tables catalogs
    • Enables point-in-time recovery without application reconfiguration

Key Recommendations for Resilient Gen AI Apps

  1. Protect the entire data pipeline, not just individual components
  2. Ensure fast recovery to minimize user/business disruption
  3. Architect for dynamic, scalable cloud environments

Clumio's Approach to Resilience

  1. Recovery in place to avoid application reconfiguration
  2. Automated discovery of existing and new cloud resources
  3. Fully serverless and elastic architecture to scale with the cloud

Next Steps

  • Sign up for a free 14-day trial of Clumio on AWS Marketplace
  • Learn more about Clumio Backtrack for DynamoDB and Iceberg via the provided QR codes

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.