TalksAWS re:Invent 2025 - Cloud Resilience in the AI Era (HMC307)

AWS re:Invent 2025 - Cloud Resilience in the AI Era (HMC307)

Navigating the AI Era: Ensuring Cloud Resilience

The Rise of Agentic AI and Enterprise Risks

  • Agentic AI refers to large language models (LLMs) with the ability to take actions on a user's behalf
  • Agentic AI is expected to drive significant enterprise productivity and workflow automation
  • However, the rise of agentic AI also introduces new risks:
    • AI agents can operate at superhuman speeds, potentially causing 10x the damage in 1/10th the time
    • AI agents use non-human identities and credentials, operating in ecosystems not designed for the AI generation
    • Enterprises lack visibility and control over the AI agents running in their environments

Key Concerns for Enterprises Adopting Agentic AI

  • Lack of a centralized view of all AI agents running in the environment
  • Uncertainty around the tools and data that AI agents can access
  • Difficulty in recovering from AI mistakes or unintended actions

Rubric's Approach: The Agent Operations Platform

Rubric has introduced the Rubric Agent Cloud, a platform that addresses the key challenges of agentic AI adoption:

1. Monitoring and Observability

  • Automatically discovers AI agents across the environment
  • Provides an inventory of agents, their credentials, and the tools/data they can access

2. Governance

  • Translates enterprise policies into software-enforced configurations
  • Ensures AI agents operate within defined boundaries

3. Remediation

  • Provides "AI Agent Rewind" functionality to undo destructive actions taken by AI agents
  • Leverages Rubric's data backup and restoration capabilities to recover from AI-related incidents

Ensuring Cloud Data Resilience

  • The shared responsibility model for cloud data resilience often leads to confusion and gaps in enterprise organizations
  • Rubric helps enterprises address the different aspects of cloud data resilience:
    • High availability: Designing for common failures and operational outages
    • Disaster recovery: Planning for regional outages and natural disasters
    • Cyber resilience: Protecting data from cyber attacks and enabling effective recovery

Tiering Data Resilience

  • Rubric recommends a tiered approach to data resilience based on the importance and criticality of the data/workloads:
    • Tier 1 (mission-critical): Requires local, remote, and cyber-vaulted copies
    • Tier 2 (business-critical): Requires local and remote copies, with optional cyber-vaulted copy
    • Tier 3 (less critical): May only require local copies, with ability to easily reproduce the environment

Addressing the S3 Data Resilience Blind Spot

  • S3 data growth is exponential, with increasing amounts of sensitive data stored in S3 buckets
  • Versioning and replication alone are not enough to protect against accidental deletions, malware, and recovery challenges at scale
  • Rubric helps enterprises:
    • Discover sensitive data in S3 buckets
    • Provide immutable, cyber-vaulted backups of S3 data
    • Detect and quarantine malware in backup data
    • Enable rapid, reliable recovery from S3 data incidents

Rubric's Cloud-Native Architecture

  • Rubric Security Cloud is a SaaS platform that provides a unified view and management of data resilience across cloud, on-premises, and AI environments
  • Leverages cross-account IAM roles and least-privileged access principles for secure, automated data protection
  • Offers a restricted data processing account and a data bunker account for secure backup storage and processing
  • Provides tag-based automation for applying backup policies based on data criticality
  • Integrates with infrastructure-as-code for seamless recovery and environment redeployment

Protecting the DevOps Environment

  • The DevOps environment, including code repositories, pipelines, and collaboration tools, is a critical but often overlooked area for data resilience
  • Rubric's DevOps protection capabilities ensure the recoverability of these essential components, enabling rapid recovery and business continuity in the event of an incident

Key Takeaways

  • Agentic AI introduces new risks and challenges that enterprises must address to enable successful AI adoption
  • Rubric's Agent Operations Platform provides a comprehensive solution for monitoring, governing, and remediating AI agent-related incidents
  • Rubric's cloud data resilience approach helps enterprises tier and protect their data based on criticality, addressing blind spots like S3 data
  • Rubric's cloud-native architecture and DevOps protection capabilities enable enterprises to build resilience across their entire digital ecosystem

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.