Talks AWS re:Invent 2025 - Unified multicloud observability (COP368) VIDEO
AWS re:Invent 2025 - Unified multicloud observability (COP368) Unified Multicloud Observability: Patterns and Strategies
Multicloud Adoption and Challenges
AWS defines multicloud as the use of at least two cloud service providers to operate IT solutions and workloads.
Customers adopt multicloud for differentiated capabilities, mergers and acquisitions, or regulatory compliance.
Key multicloud observability challenges include:
Complexity: Multiple tools, service dependencies, and time-consuming troubleshooting across dashboards.
Process overhead: More instrumentation work and team members with focused knowledge in their own tooling.
Scale: Increased telemetry data volume leading to ingestion latency, query performance issues, and escalating data egress costs.
Observability Foundations
Observability aims to detect, investigate, and remediate issues, reducing mean time to resolution and understanding system behavior.
The core building blocks are metrics, logs, and traces collected from applications.
Key observability strategy questions include:
How to instrument workloads consistently across clouds to gather telemetry?
What signals to prioritize and retain as data scales?
What storage strategies to use for cost-effective and performant data storage?
What visualization tools to make sense of the observability data?
Centralized Collection Pattern
Collects telemetry from multiple cloud providers into a centralized location for unified observability.
Key components:
Collector agent on compute workloads to gather telemetry.
Injection layer for reliable data transfer, buffering, and back-pressure handling.
Normalization layer to convert data to a common format.
Storage layer to hold the full request context for querying.
Visualization layer for global application view and troubleshooting.
Open-source implementation example:
Collector: OpenTelemetry Collector
Injection: Kong API Gateway, Kafka
Normalization: OpenSearch Data Prepper, Prometheus
Visualization: Grafana
Cloud-native implementation example:
Collector: Amazon CloudWatch Agent
Storage: Amazon CloudWatch
Visualization: Amazon CloudWatch Dashboards
Federated Query Pattern
Allows querying data across multiple cloud providers while keeping data in its original location.
Key components:
Federation layer that splits queries into subqueries for each cloud provider.
Metadata catalog to store schema information for query planning.
Connectors to execute subqueries in each cloud and aggregate the results.
Open-source implementation example:
Storage: AWS S3, GCP Storage
Federation: Trino, Hive Metastore
Visualization: Grafana
Cloud-native implementation example:
Storage: AWS S3, GCP Storage
Federation: Amazon Athena, AWS Glue
Visualization: Amazon Managed Grafana
Pattern Comparison and Tradeoffs
Centralized Collection:
Optimized for speed and simplicity, enabling faster queries and easier governance.
Increases storage costs.
Federated Query:
Provides flexibility and data freshness, with data remaining in its original location.
Introduces query latency and more complex orchestration.
Customer Story: Philips 66
Global energy company with 70% of workloads in AWS and footprints in multiple clouds and on-premises.
Implemented Centralized Collection pattern using OpenTelemetry, Amazon Managed Prometheus, and Amazon Managed Grafana.
Achieved 30% faster mean time to resolution for issue resolution.
Next Steps and Resources
Attend related breakout sessions at AWS re:Invent 2025 for deeper multicloud expertise.
Visit the Multicloud Kiosk at the re:Invent Village for live demos, use case discussions, and expert guidance.
Leverage resources like whitepapers, blog posts, and architectural patterns to strategize, design, and optimize multicloud observability.
Your Digital Journey deserves a great story. Build one with us.