Optimizing performance and cost with AWS Graviton (DEV302)

Here is a detailed summary of the video transcript in markdown format, broken into sections for better readability. The key takeaways are preserved without losing important details.

Context and Motivation

  • Honeycomb is a software-as-a-service company that enables developers to get better insights into their systems using big data.
  • Honeycomb has a strong focus on performance and aims to return 99.5% of queries in under 10 seconds.
  • Honeycomb is a 100% AWS shop, ingesting signals from AWS and customer applications, and storing data using a custom columnar format.
  • Honeycomb has grown significantly over the past 5 years, with a 20x increase in trace spans per second and a 20x increase in weekly queries.

The Evolution of CPU Architectures

  • In the 1980s, mainframes had bespoke architectures, and software had to be specifically developed for each platform.
  • The rise of desktop computing led to the dominance of x86 and 68k architectures, with expensive porting processes.
  • The mobile revolution led to the rise of ARM 32-bit chips, which were more power-efficient.
  • The need for a powerful and cost-effective alternative to x86-64 led to the development of ARM 64-bit processors, such as AWS Graviton.

Honeycomb's Graviton Migration Journey

Evaluating the Risks and Benefits

  • Honeycomb evaluated the potential benefits of Graviton, including better cost, performance, and environmental impact.
  • They also considered the potential risks and worked to mitigate them, such as ensuring user experience and defining clear success criteria.
  • Honeycomb also considered the impact on their team's bandwidth and the long-term viability of the ARM architecture.

Porting the Software to ARM

  • Honeycomb had to produce build artifacts for the ARM architecture, which required changes to their CI tooling.
  • The use of the Go programming language made the porting process relatively straightforward, as it supported multiple architectures.
  • However, they encountered some challenges with building Docker images on non-native architectures, which required the use of QEMU.

A/B Testing and Rollout Strategy

  • Honeycomb conducted A/B testing to compare the performance and cost-effectiveness of Graviton instances.
  • They focused on factors such as correct results, latency, throughput, and the ability to scale down instances.
  • Honeycomb also considered hidden risks, such as the ability to roll back changes and the impact on their developers.
  • Honeycomb started the migration in their "dog food" environment, a staging environment that uses their own software, before rolling it out to production.

Scaling the Migration to Kubernetes and Beyond

  • Honeycomb migrated to using Kubernetes (EKS) for their scale-out workloads, taking advantage of Graviton support in EKS.
  • They found that maintaining homogeneity in their Kubernetes clusters, using the same Graviton generation, was crucial for predictable performance and autoscaling.
  • Honeycomb also migrated their stateful workloads, such as Kafka brokers and their indexing engine, to Graviton instances, achieving significant performance and cost improvements.
  • For their serverless workloads, Honeycomb leveraged AWS Lambda with Graviton, encountering some initial challenges but ultimately realizing substantial cost savings.

Results and Lessons Learned

  • Honeycomb has achieved a 2.2x price-performance improvement over the past 5 years, while scaling their infrastructure by 20x and only increasing their AWS bill by 13x.
  • They have seen rock-solid reliability and stability in their Graviton-powered infrastructure, with consistent performance improvements with each new Graviton generation.
  • Honeycomb's key lessons include:
    • Set clear goals and measure the impact of the migration.
    • Mitigate risks by defining success criteria and having a rollback plan.
    • Consider the impact on your team's bandwidth and the long-term viability of the architecture.
    • Prioritize homogeneity in your infrastructure for predictable performance.
    • Leverage the latest Graviton generations to maximize cost and performance benefits.

Resources and Next Steps

  • Honeycomb has published a book, "Observability Engineering," that covers their work on efficient data storage, including their Graviton migration.
  • The author, Liz Fong-Jones, can be reached on LinkedIn and at her Blu Sky account for further discussions on this topic.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us