Talks AWS re:Invent 2025 - How Allianz designed AIOps at enterprise scale (IND3321) VIDEO
AWS re:Invent 2025 - How Allianz designed AIOps at enterprise scale (IND3321) Summary of AWS re:Invent 2025 Presentation: "How Allianz Designed AIOps at Enterprise Scale"
Introduction to Challenges in Building AI/GenAI Platforms at Scale
Over the past 5-8 years, building ML/AI platforms has been a complex but relatively stable challenge
Platforms needed to provide capabilities for training models, experimentation, monitoring, inference, etc.
Tools like Amazon SageMaker helped simplify this "heavy lifting"
However, the explosion of GenAI capabilities in the last 2 years has added significant new complexity
New capabilities like prompt engineering, coding assistants, and agent frameworks are being rapidly adopted
This complexity is compounded by the proliferation of diverse frameworks and user groups building AI/GenAI applications
Allianz's Approach: Identifying Stable Elements and Decoupling Them
The goal is to provide a platform and operating model that doesn't lock users into specific frameworks
Instead, identify the "stable elements" that can be standardized and made part of the platform
Decouple these stable elements from the rapidly changing "fast-moving" components
This allows the platform to evolve and scale adoption of AI/GenAI without constant refactoring
Allianz's AIOps Platform: Key Components
Data Science Workbench
Based on Amazon SageMaker, provides self-service access to ML/AI tools for different personas
Includes Jupyter Lab, Python, R, and no-code options like SageMaker Canvas
Enables rapid experimentation and model development
Prompt Management and Evolution
Recognizes that prompt engineering is a key part of building GenAI applications
Provides a Git-based platform for managing and evolving prompts
Automates the process of versioning, branching, and deploying prompts
Gives business users a visual interface to interact with the prompt management system
Decoupled Runtime Options
Allows teams to choose the appropriate runtime for their needs (e.g. Lambda, ECS, EKS)
Uses open standards like containers and OpenTelemetry for portability and observability
Provides a clear handover process to bring applications into production
Dual Pathway Approach
"Happy path" provides a fully integrated, opinionated data science workbench
"Self-managed" path allows teams more flexibility, with clear handover standards
Balances the need for governance and security with the need for agility and experimentation
Key Takeaways and Business Impact
Focused on building on "low-regret" standards that are already established in the organization
Leveraged open standards like Git, containers, and OpenTelemetry to decouple components
Enabled rapid experimentation and evolution of GenAI applications while maintaining governance
Empowered diverse user groups, from data scientists to business users, to build AI/GenAI solutions
Demonstrated how an enterprise-scale AIOps platform can accelerate AI/GenAI adoption
Technical Details and Examples
Used Amazon SageMaker as the foundation for the data science workbench
Integrated with AWS services like Textract, Comprehend, and Translate for common AI/GenAI tasks
Automated the deployment of infrastructure-as-code templates via a conversational agent
Leveraged Git for prompt management, with automated versioning, branching, and deployment
Utilized OpenTelemetry for end-to-end observability and cost tracking of AI/GenAI workloads
Business Impact and Use Cases
Enabled Allianz to rapidly process and extract insights from large volumes of documents (e.g., insurance forms)
Empowered business users to build and iterate on GenAI-powered chatbots and agents without deep technical expertise
Accelerated the development and deployment of AI-powered applications across the organization
Provided a scalable and secure platform to support AI/GenAI initiatives, from small experiments to enterprise-wide deployments
Your Digital Journey deserves a great story. Build one with us.