TalksAWS re:Invent 2025 - How Netflix Connects Product Experiments to its AWS Bill (IND388)

AWS re:Invent 2025 - How Netflix Connects Product Experiments to its AWS Bill (IND388)

Connecting Product Experiments to the AWS Bill at Netflix

Challenges of Experimentation at Scale

  • Netflix relies heavily on A/B testing to drive product innovation
  • Hundreds or thousands of experiments run simultaneously on shared infrastructure
  • Difficulty attributing infrastructure usage and costs to individual experiments
  • Mismatch between granularity of infrastructure metrics/costs and feature-level experimentation

Netflix's Attribution and Projection Framework

Attribution: Tracing Signals to Usage Deltas

  • Goal is to identify significant changes in infrastructure usage between experiment treatment and control groups
  • Leverage distributed tracing to track request flows across microservices
  • Overcome challenges of sampling, data quality, and scale in tracing data

Estimation: Translating Usage Deltas to Cost Projections

  • Train machine learning models to learn relationship between usage patterns and infrastructure costs
  • Simulate cost impact of experiment by applying usage deltas to production cost models
  • Aggregate cost projections across affected services to estimate total experiment cost

Putting the Framework into Practice

Example: "Smarter Prefetch" Experiment

  • Experiment hypothesis: Preloading content can improve app responsiveness
  • Attribution identified 40% increase in requests to metadata service
  • Estimation projected $750,000 increase in annual AWS costs if rolled out globally

Lessons Learned and Future Roadmap

Key Hurdles Overcome

  1. Ensuring trace completeness across the infrastructure
  2. Adapting to constantly evolving infrastructure
  3. Handling statically provisioned vs. auto-scaling services

Unlocking Operational Excellence

  • Proactive capacity planning based on experiment usage projections
  • "Shift left" validation of architectural impacts during experimentation
  • Holistic performance insights linking infrastructure to user experience

Future Expansion

  • Incorporate storage costs and batch processing workloads
  • Apply framework to broader set of infrastructure domains beyond consumer-facing systems

Key Takeaways

  1. Attribution is foundational to understanding experiment impact on infrastructure
  2. Translating usage deltas to cost projections enables proactive, data-driven decision making
  3. Democratizing this data across product and engineering teams is crucial
  4. Treating cost as a design constraint can lead to better products and more efficient infrastructure

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.