Talks AWS re:Invent 2025 - Why is Reliability So Hard? (DVT227) VIDEO
AWS re:Invent 2025 - Why is Reliability So Hard? (DVT227) Summary of AWS re:Invent 2025 Presentation: "Why is Reliability So Hard?"
Introduction
Presenter: Hannes Lank, CEO and co-founder of Czechly
Czechly helps organizations detect, communicate, and resolve software reliability issues faster
Goal is to help engineers "own reliability from pull request to postmodern software"
The Evolution of Software Reliability
A decade ago, software was built and shipped much less frequently (yearly, quarterly, monthly)
Today, software is built and shipped almost instantly, but reliability has not kept pace
Yesterday's applications were simple, with few dependencies - issues were easy to identify
Modern applications are highly complex, with many dependencies that introduce potential failure points
The Reliability Challenge
Increased complexity and dependencies make it harder to ensure reliability
Traditional approaches of more people, processes, and testing have not solved the problem
Both development and operations teams try to validate application functionality, but in siloed ways
Key Principles for High-Performing Teams
Predictability : Ability to predict how an application will behave when released to production
Accountability : Knowing what changed, who changed it, why, and when - to identify the root cause of issues
Resiliency : Building applications that can be quickly rolled back in the event of problems
Czechly's Approach
Unifies testing and monitoring into a single, version-controlled workflow
Allows teams to build tests (UI, API, uptime) as code and deploy them for continuous monitoring
Integrates the reliability pipeline with the CI/CD pipeline, enabling a common language and visibility
The Evolving "You" in Software Reliability
Traditionally, "you" referred to the developer or engineer responsible for the code
Today, "you" encompasses anyone who touches the user experience, including agents, cloud code, and other tools
In the future, agents may be capable of building, testing, monitoring, and owning more of the software lifecycle
Key Takeaways
Reliability has not kept pace with the rapid evolution of software development
Increased complexity and dependencies make it harder to ensure reliability using traditional approaches
High-performing teams focus on predictability, accountability, and resiliency to improve reliability
Czechly's approach unifies testing and monitoring, integrating the reliability pipeline with CI/CD
The concept of "you" in software reliability is expanding to include a wider range of stakeholders and tools
Your Digital Journey deserves a great story. Build one with us.