Talks AWS re:Invent 2025 - DraftKings: Scaling to 1M operations/min on Aurora during Super Bowl (DAT320) VIDEO
AWS re:Invent 2025 - DraftKings: Scaling to 1M operations/min on Aurora during Super Bowl (DAT320) Scaling the DraftKings Financial Ledger on Amazon Aurora MySQL
Overview
DraftKings is a sports entertainment and gaming company operating in online sportsbook, i-gaming, fantasy sports, and lottery
DraftKings has to handle massive scale during peak events like NFL Sundays, with millions of paying customers and high-velocity transactions
DraftKings' Scale Challenges
Deposit Volume : Massive spikes in deposit transactions as customers fund their accounts before games start
Debit Traffic : Equally high volume of debit transactions as customers place bets and enter contests
In-Game Read Traffic : Customers constantly checking balances and bet statuses during games, driving huge read loads
Post-Game Payout Spike : After games end, all systems need to rapidly payout winnings, causing a 30x spike in transactions
The DraftKings Financial Ledger Architecture
The financial ledger is a central service running on Amazon EKS, handling synchronous debit traffic and asynchronous payout instructions
The ledger is built on Amazon Aurora MySQL, leveraging key features to handle the scale:
Read replicas to offload read traffic and keep the writer free for updates
Ability to rapidly scale out read replicas as needed
Extremely low latency (under 15ms) for read replicas, enabling immediate balance updates
Automated failover capabilities to quickly recover from hardware failures
Optimizing for 1 Million Ops/Minute
Profiling and identifying performance bottlenecks is crucial, using tools like Amazon CloudWatch Database Insights
Optimizing stored procedures, avoiding slow operations like table scans and unnecessary temp table usage
Separating reads and writes is key - directing readonly queries to read replicas to avoid impacting the writer
Scaling out the ledger by sharding based on user identifiers, distributing traffic across multiple Aurora clusters
Business Impact
Cost savings by scaling out with smaller Aurora instances instead of a single large instance
Improved data warehouse performance by parallelizing CDC replication across multiple Aurora clusters
Seamless handling of hardware failures and spikes in traffic due to Aurora's automated scaling and failover capabilities
Key Takeaways
Separate reads and writes to maximize throughput and avoid bottlenecks on the writer
Leverage Aurora's read replica capabilities to offload read traffic and maintain low latency
Profile and optimize performance at the database level, identifying and addressing slow operations
Scale out the architecture by sharding data across multiple Aurora clusters to linearly increase throughput
Aurora's managed service capabilities, like automated scaling and failover, are crucial for handling unpredictable spikes in traffic
Your Digital Journey deserves a great story. Build one with us.