Talks AWS re:Invent 2025 - How Netflix Connects Product Experiments to its AWS Bill (IND388) VIDEO
AWS re:Invent 2025 - How Netflix Connects Product Experiments to its AWS Bill (IND388) Connecting Product Experiments to the AWS Bill at Netflix
Challenges of Experimentation at Scale
Netflix relies heavily on A/B testing to drive product innovation
Hundreds or thousands of experiments run simultaneously on shared infrastructure
Difficulty attributing infrastructure usage and costs to individual experiments
Mismatch between granularity of infrastructure metrics/costs and feature-level experimentation
Netflix's Attribution and Projection Framework
Attribution: Tracing Signals to Usage Deltas
Goal is to identify significant changes in infrastructure usage between experiment treatment and control groups
Leverage distributed tracing to track request flows across microservices
Overcome challenges of sampling, data quality, and scale in tracing data
Estimation: Translating Usage Deltas to Cost Projections
Train machine learning models to learn relationship between usage patterns and infrastructure costs
Simulate cost impact of experiment by applying usage deltas to production cost models
Aggregate cost projections across affected services to estimate total experiment cost
Putting the Framework into Practice
Example: "Smarter Prefetch" Experiment
Experiment hypothesis: Preloading content can improve app responsiveness
Attribution identified 40% increase in requests to metadata service
Estimation projected $750,000 increase in annual AWS costs if rolled out globally
Lessons Learned and Future Roadmap
Key Hurdles Overcome
Ensuring trace completeness across the infrastructure
Adapting to constantly evolving infrastructure
Handling statically provisioned vs. auto-scaling services
Unlocking Operational Excellence
Proactive capacity planning based on experiment usage projections
"Shift left" validation of architectural impacts during experimentation
Holistic performance insights linking infrastructure to user experience
Future Expansion
Incorporate storage costs and batch processing workloads
Apply framework to broader set of infrastructure domains beyond consumer-facing systems
Key Takeaways
Attribution is foundational to understanding experiment impact on infrastructure
Translating usage deltas to cost projections enables proactive, data-driven decision making
Democratizing this data across product and engineering teams is crucial
Treating cost as a design constraint can lead to better products and more efficient infrastructure
Your Digital Journey deserves a great story. Build one with us.