Here is a detailed summary of the video transcript in markdown format, broken into sections for better readability. The key takeaways are preserved without losing important details.
Context and Motivation
- Honeycomb is a software-as-a-service company that enables developers to get better insights into their systems using big data.
- Honeycomb has a strong focus on performance and aims to return 99.5% of queries in under 10 seconds.
- Honeycomb is a 100% AWS shop, ingesting signals from AWS and customer applications, and storing data using a custom columnar format.
- Honeycomb has grown significantly over the past 5 years, with a 20x increase in trace spans per second and a 20x increase in weekly queries.
The Evolution of CPU Architectures
- In the 1980s, mainframes had bespoke architectures, and software had to be specifically developed for each platform.
- The rise of desktop computing led to the dominance of x86 and 68k architectures, with expensive porting processes.
- The mobile revolution led to the rise of ARM 32-bit chips, which were more power-efficient.
- The need for a powerful and cost-effective alternative to x86-64 led to the development of ARM 64-bit processors, such as AWS Graviton.
Honeycomb's Graviton Migration Journey
Evaluating the Risks and Benefits
- Honeycomb evaluated the potential benefits of Graviton, including better cost, performance, and environmental impact.
- They also considered the potential risks and worked to mitigate them, such as ensuring user experience and defining clear success criteria.
- Honeycomb also considered the impact on their team's bandwidth and the long-term viability of the ARM architecture.
Porting the Software to ARM
- Honeycomb had to produce build artifacts for the ARM architecture, which required changes to their CI tooling.
- The use of the Go programming language made the porting process relatively straightforward, as it supported multiple architectures.
- However, they encountered some challenges with building Docker images on non-native architectures, which required the use of QEMU.
A/B Testing and Rollout Strategy
- Honeycomb conducted A/B testing to compare the performance and cost-effectiveness of Graviton instances.
- They focused on factors such as correct results, latency, throughput, and the ability to scale down instances.
- Honeycomb also considered hidden risks, such as the ability to roll back changes and the impact on their developers.
- Honeycomb started the migration in their "dog food" environment, a staging environment that uses their own software, before rolling it out to production.
Scaling the Migration to Kubernetes and Beyond
- Honeycomb migrated to using Kubernetes (EKS) for their scale-out workloads, taking advantage of Graviton support in EKS.
- They found that maintaining homogeneity in their Kubernetes clusters, using the same Graviton generation, was crucial for predictable performance and autoscaling.
- Honeycomb also migrated their stateful workloads, such as Kafka brokers and their indexing engine, to Graviton instances, achieving significant performance and cost improvements.
- For their serverless workloads, Honeycomb leveraged AWS Lambda with Graviton, encountering some initial challenges but ultimately realizing substantial cost savings.
Results and Lessons Learned
- Honeycomb has achieved a 2.2x price-performance improvement over the past 5 years, while scaling their infrastructure by 20x and only increasing their AWS bill by 13x.
- They have seen rock-solid reliability and stability in their Graviton-powered infrastructure, with consistent performance improvements with each new Graviton generation.
- Honeycomb's key lessons include:
- Set clear goals and measure the impact of the migration.
- Mitigate risks by defining success criteria and having a rollback plan.
- Consider the impact on your team's bandwidth and the long-term viability of the architecture.
- Prioritize homogeneity in your infrastructure for predictable performance.
- Leverage the latest Graviton generations to maximize cost and performance benefits.
Resources and Next Steps
- Honeycomb has published a book, "Observability Engineering," that covers their work on efficient data storage, including their Graviton migration.
- The author, Liz Fong-Jones, can be reached on LinkedIn and at her Blu Sky account for further discussions on this topic.