TalksAWS re:Invent 2025 - DraftKings: Scaling to 1M operations/min on Aurora during Super Bowl (DAT320)

AWS re:Invent 2025 - DraftKings: Scaling to 1M operations/min on Aurora during Super Bowl (DAT320)

Scaling the DraftKings Financial Ledger on Amazon Aurora MySQL

Overview

  • DraftKings is a sports entertainment and gaming company operating in online sportsbook, i-gaming, fantasy sports, and lottery
  • DraftKings has to handle massive scale during peak events like NFL Sundays, with millions of paying customers and high-velocity transactions

DraftKings' Scale Challenges

  1. Deposit Volume: Massive spikes in deposit transactions as customers fund their accounts before games start
  2. Debit Traffic: Equally high volume of debit transactions as customers place bets and enter contests
  3. In-Game Read Traffic: Customers constantly checking balances and bet statuses during games, driving huge read loads
  4. Post-Game Payout Spike: After games end, all systems need to rapidly payout winnings, causing a 30x spike in transactions

The DraftKings Financial Ledger Architecture

  • The financial ledger is a central service running on Amazon EKS, handling synchronous debit traffic and asynchronous payout instructions
  • The ledger is built on Amazon Aurora MySQL, leveraging key features to handle the scale:
    • Read replicas to offload read traffic and keep the writer free for updates
    • Ability to rapidly scale out read replicas as needed
    • Extremely low latency (under 15ms) for read replicas, enabling immediate balance updates
    • Automated failover capabilities to quickly recover from hardware failures

Optimizing for 1 Million Ops/Minute

  • Profiling and identifying performance bottlenecks is crucial, using tools like Amazon CloudWatch Database Insights
  • Optimizing stored procedures, avoiding slow operations like table scans and unnecessary temp table usage
  • Separating reads and writes is key - directing readonly queries to read replicas to avoid impacting the writer
  • Scaling out the ledger by sharding based on user identifiers, distributing traffic across multiple Aurora clusters

Business Impact

  • Cost savings by scaling out with smaller Aurora instances instead of a single large instance
  • Improved data warehouse performance by parallelizing CDC replication across multiple Aurora clusters
  • Seamless handling of hardware failures and spikes in traffic due to Aurora's automated scaling and failover capabilities

Key Takeaways

  1. Separate reads and writes to maximize throughput and avoid bottlenecks on the writer
  2. Leverage Aurora's read replica capabilities to offload read traffic and maintain low latency
  3. Profile and optimize performance at the database level, identifying and addressing slow operations
  4. Scale out the architecture by sharding data across multiple Aurora clusters to linearly increase throughput
  5. Aurora's managed service capabilities, like automated scaling and failover, are crucial for handling unpredictable spikes in traffic

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.