TalksAWS re:Invent 2025 - Cloud Resilience in the AI Era (HMC307)
AWS re:Invent 2025 - Cloud Resilience in the AI Era (HMC307)
Navigating the AI Era: Ensuring Cloud Resilience
The Rise of Agentic AI and Enterprise Risks
Agentic AI refers to large language models (LLMs) with the ability to take actions on a user's behalf
Agentic AI is expected to drive significant enterprise productivity and workflow automation
However, the rise of agentic AI also introduces new risks:
AI agents can operate at superhuman speeds, potentially causing 10x the damage in 1/10th the time
AI agents use non-human identities and credentials, operating in ecosystems not designed for the AI generation
Enterprises lack visibility and control over the AI agents running in their environments
Key Concerns for Enterprises Adopting Agentic AI
Lack of a centralized view of all AI agents running in the environment
Uncertainty around the tools and data that AI agents can access
Difficulty in recovering from AI mistakes or unintended actions
Rubric's Approach: The Agent Operations Platform
Rubric has introduced the Rubric Agent Cloud, a platform that addresses the key challenges of agentic AI adoption:
1. Monitoring and Observability
Automatically discovers AI agents across the environment
Provides an inventory of agents, their credentials, and the tools/data they can access
2. Governance
Translates enterprise policies into software-enforced configurations
Ensures AI agents operate within defined boundaries
3. Remediation
Provides "AI Agent Rewind" functionality to undo destructive actions taken by AI agents
Leverages Rubric's data backup and restoration capabilities to recover from AI-related incidents
Ensuring Cloud Data Resilience
The shared responsibility model for cloud data resilience often leads to confusion and gaps in enterprise organizations
Rubric helps enterprises address the different aspects of cloud data resilience:
High availability: Designing for common failures and operational outages
Disaster recovery: Planning for regional outages and natural disasters
Cyber resilience: Protecting data from cyber attacks and enabling effective recovery
Tiering Data Resilience
Rubric recommends a tiered approach to data resilience based on the importance and criticality of the data/workloads:
Tier 1 (mission-critical): Requires local, remote, and cyber-vaulted copies
Tier 2 (business-critical): Requires local and remote copies, with optional cyber-vaulted copy
Tier 3 (less critical): May only require local copies, with ability to easily reproduce the environment
Addressing the S3 Data Resilience Blind Spot
S3 data growth is exponential, with increasing amounts of sensitive data stored in S3 buckets
Versioning and replication alone are not enough to protect against accidental deletions, malware, and recovery challenges at scale
Rubric helps enterprises:
Discover sensitive data in S3 buckets
Provide immutable, cyber-vaulted backups of S3 data
Detect and quarantine malware in backup data
Enable rapid, reliable recovery from S3 data incidents
Rubric's Cloud-Native Architecture
Rubric Security Cloud is a SaaS platform that provides a unified view and management of data resilience across cloud, on-premises, and AI environments
Leverages cross-account IAM roles and least-privileged access principles for secure, automated data protection
Offers a restricted data processing account and a data bunker account for secure backup storage and processing
Provides tag-based automation for applying backup policies based on data criticality
Integrates with infrastructure-as-code for seamless recovery and environment redeployment
Protecting the DevOps Environment
The DevOps environment, including code repositories, pipelines, and collaboration tools, is a critical but often overlooked area for data resilience
Rubric's DevOps protection capabilities ensure the recoverability of these essential components, enabling rapid recovery and business continuity in the event of an incident
Key Takeaways
Agentic AI introduces new risks and challenges that enterprises must address to enable successful AI adoption
Rubric's Agent Operations Platform provides a comprehensive solution for monitoring, governing, and remediating AI agent-related incidents
Rubric's cloud data resilience approach helps enterprises tier and protect their data based on criticality, addressing blind spots like S3 data
Rubric's cloud-native architecture and DevOps protection capabilities enable enterprises to build resilience across their entire digital ecosystem
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.