TalksAWS re:Invent 2025 - From Metrics to Management: Practical Observability on EKS (DEV202)
AWS re:Invent 2025 - From Metrics to Management: Practical Observability on EKS (DEV202)
Summary of "From Metrics to Management: Practical Observability on EKS" (AWS re:Invent 2025)
Introduction
Presenters: Dale Orers (Software Engineer) and Emil Labinsky (AWS DevOps Engineer)
Agenda: Define observability, discuss making applications observable, explore AWS services for observability (Amazon Managed Service for Prometheus, Amazon Managed Grafana)
Understanding Observability
Observability is about extracting actionable insights to assess and improve application performance, health, and behavior
Observability is a continuous cycle of detecting metrics, investigating, remediating issues, and assessing improvements
Key benefits of observability:
Gain visibility into application health
Improve troubleshooting and issue resolution
Deliver superior customer experience
Control costs
Observability Signals
Metrics: Indicate the presence of a problem (e.g. 25% slower application)
Traces: Identify the location of a problem (e.g. dependencies between services)
Logs: Determine the cause of a problem (e.g. database startup issue after update)
Profiles: Provide insight into code behavior (e.g. potential memory leak)
Together, these signals enable better SLO/SLA compliance
Making Applications Observable
Amazon Managed Service for Prometheus
Fully managed, serverless Prometheus-compatible monitoring service
Uses Prometheus data model and querying language (PromQL)
Enables seamless monitoring of containerized workloads
Best practices:
Use private link and IAM for secure data transfer
Optimize scrape interval and relabel configs to reduce costs
Adjust retention period based on data requirements
Run multiple collector containers for high availability
Use consistent identifiers for data correlation
Amazon Managed Grafana
Fully managed, open-source analytics and visualization platform
Integrates with Amazon Managed Service for Prometheus
Best practices for dashboards:
Each dashboard should tell a clear, specific story to answer a question
Design dashboards to be simple and responsive
Demonstration
Provisioned a Kubernetes cluster with sample application (web-2048)
Configured Amazon Managed Service for Prometheus:
Created a scraper to collect Prometheus metrics
Defined a workspace to store and query the collected data
Set appropriate retention period for cost management
Configured Amazon Managed Grafana:
Created a data source pointing to Amazon Managed Service for Prometheus
Designed dashboards to visualize cluster and application metrics
Applied filters and labels to focus on specific data of interest
Key Takeaways
Observability is a critical practice for ensuring application health, performance, and customer experience
Amazon Managed Service for Prometheus and Amazon Managed Grafana provide fully managed, scalable solutions for implementing observability
Best practices around scrape intervals, relabeling, retention, and dashboard design help optimize observability for cost and effectiveness
Comprehensive observability, leveraging metrics, traces, logs, and profiles, enables better troubleshooting, SLO/SLA compliance, and overall application management
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.