TalksAWS re:Invent 2025 - Maximizing uptime: Itaú’s mission-critical mainframe migration to AWS (IND3304)

AWS re:Invent 2025 - Maximizing uptime: Itaú’s mission-critical mainframe migration to AWS (IND3304)

Itaú's Mission-Critical Mainframe Migration to AWS

Introduction

  • Itaú Bank is one of the largest banks in Latin America, with over 70 million customers and a 100-year history.
  • Itaú recognized the need to modernize its core banking systems to keep up with changing customer behaviors and digital transformation in the financial industry.
  • The bank decided to migrate its mission-critical checking account platform, which had been running on a mainframe for over 50 years, to the cloud on AWS.

Key Challenges and Requirements

  • Ensure high availability, with a target of 99.99% uptime (less than 5 minutes of downtime per month).
  • Maintain absolute accuracy and consistency of customer account balances, a non-negotiable requirement.
  • Handle massive scale, with the ability to authorize up to 6,000 transactions per second at peak times and support large accounts with 1,000 transactions per second.
  • Minimize the impact of failures, ensuring that only a portion of customers are affected rather than all customers.

Architecture: Cell-Based Design for High Availability

  • Adopted a cell-based architecture using the Kámadávara (KVA) framework, which provides isolation, fault tolerance, and scalability.
  • Each cell is a fully autonomous unit that owns its own data and can operate independently.
  • The router layer directs customer transactions to the appropriate cell based on a partitioning strategy (e.g., full table mapping, prefix range, consistent hashing).
  • Each cell has multiple replicas (active-active or active-standby) to ensure high availability, with replicas placed in different availability zones.
  • The journal component coordinates data replication across cell replicas using a quorum-based approach to ensure consistency.

Database Selection: Amazon DynamoDB for Consistency and Performance

  • Evaluated various database options, including SQL Server, Amazon Aurora, and Amazon DynamoDB.
  • Chose Amazon DynamoDB due to its ability to provide the required high availability, predictable performance, and ACID (Atomicity, Consistency, Isolation, Durability) transactions.
  • Designed the DynamoDB table schema to leverage partition keys and sort keys to achieve idempotency and isolation for account balance updates.
  • Leveraged DynamoDB Transactions API to perform atomic, consistent, and durable updates to account balances.
  • Achieved high performance, reaching 1,200 transactions per second on average with 79ms latency for large accounts.

Migration Strategy: Gradual Dark Launch and Validation

  • Implemented a dark launch strategy, where the new architecture processes all transactions in parallel with the existing mainframe-based system.
  • In the shadow traffic mode, the new architecture validates each transaction against the existing system to ensure correctness before migrating customers.
  • Gradually migrated customers from the legacy system to the new cloud-based platform, ensuring a seamless transition.
  • Utilized AWS services like Amazon Kafka (MSK) and Amazon SQS to decouple the synchronous authorization flow from the asynchronous dispatch process.

Key Lessons and Takeaways

  • Collaboration, experimentation, and a shared purpose are essential for successful transformation.
  • Invest in building a strong foundation, understanding how each AWS service fits your use case, and validating performance before going to production.
  • Avoid complex solutions when simpler ones can achieve the same goals. Focus on trade-offs in operations, observability, and cost.
  • Leverage AWS resources, such as free digital courses and labs, to accelerate your learning and development.
  • Customer obsession should be a guiding principle, as demonstrated by the DynamoDB team's responsiveness to Itaú's support case.

Conclusion

Itaú's journey to modernize its mission-critical checking account platform showcases the power of cloud-based architectures, the importance of high availability and data consistency, and the value of a well-planned migration strategy. By leveraging AWS services and adopting a cell-based design, Itaú was able to achieve its goals of scalability, resilience, and customer-centricity, setting the stage for continued innovation and growth.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.