TalksAWS re:Invent 2025 - From Metrics to Management: Practical Observability on EKS (DEV202)

AWS re:Invent 2025 - From Metrics to Management: Practical Observability on EKS (DEV202)

Summary of "From Metrics to Management: Practical Observability on EKS" (AWS re:Invent 2025)

Introduction

  • Presenters: Dale Orers (Software Engineer) and Emil Labinsky (AWS DevOps Engineer)
  • Agenda: Define observability, discuss making applications observable, explore AWS services for observability (Amazon Managed Service for Prometheus, Amazon Managed Grafana)

Understanding Observability

  • Observability is about extracting actionable insights to assess and improve application performance, health, and behavior
  • Observability is a continuous cycle of detecting metrics, investigating, remediating issues, and assessing improvements
  • Key benefits of observability:
    • Gain visibility into application health
    • Improve troubleshooting and issue resolution
    • Deliver superior customer experience
    • Control costs

Observability Signals

  • Metrics: Indicate the presence of a problem (e.g. 25% slower application)
  • Traces: Identify the location of a problem (e.g. dependencies between services)
  • Logs: Determine the cause of a problem (e.g. database startup issue after update)
  • Profiles: Provide insight into code behavior (e.g. potential memory leak)
  • Together, these signals enable better SLO/SLA compliance

Making Applications Observable

Amazon Managed Service for Prometheus

  • Fully managed, serverless Prometheus-compatible monitoring service
  • Uses Prometheus data model and querying language (PromQL)
  • Enables seamless monitoring of containerized workloads
  • Best practices:
    • Use private link and IAM for secure data transfer
    • Optimize scrape interval and relabel configs to reduce costs
    • Adjust retention period based on data requirements
    • Run multiple collector containers for high availability
    • Use consistent identifiers for data correlation

Amazon Managed Grafana

  • Fully managed, open-source analytics and visualization platform
  • Integrates with Amazon Managed Service for Prometheus
  • Best practices for dashboards:
    • Each dashboard should tell a clear, specific story to answer a question
    • Design dashboards to be simple and responsive

Demonstration

  • Provisioned a Kubernetes cluster with sample application (web-2048)
  • Configured Amazon Managed Service for Prometheus:
    • Created a scraper to collect Prometheus metrics
    • Defined a workspace to store and query the collected data
    • Set appropriate retention period for cost management
  • Configured Amazon Managed Grafana:
    • Created a data source pointing to Amazon Managed Service for Prometheus
    • Designed dashboards to visualize cluster and application metrics
    • Applied filters and labels to focus on specific data of interest

Key Takeaways

  • Observability is a critical practice for ensuring application health, performance, and customer experience
  • Amazon Managed Service for Prometheus and Amazon Managed Grafana provide fully managed, scalable solutions for implementing observability
  • Best practices around scrape intervals, relabeling, retention, and dashboard design help optimize observability for cost and effectiveness
  • Comprehensive observability, leveraging metrics, traces, logs, and profiles, enables better troubleshooting, SLO/SLA compliance, and overall application management

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.