BBVA: Building a Multi-Region, Multi-Country Data Platform at Scale
Introduction
- BBVA, a global financial institution with presence in over 25 countries and 70 million customers, announced a strategic partnership with AWS in June 2023 to become a data and AI-driven organization.
- The goal was to leverage AWS analytics and AI services to create a new global data platform, called ADA, that would be deployed globally at scale.
- The new data platform would provide internal business stakeholders with automated business and marketing insights, increasing operational efficiencies and attracting new customers.
- The global data platform was delivered on time and on budget, and is now live.
Challenges and Lessons Learned
- The complexity of the project was not in building the new platform, but in migrating everything from on-premise to the new platform while keeping both platforms running in parallel.
- The parallel phase was one of the most important and challenging parts of the project, as they had to run over 50,000 processes on both platforms simultaneously.
- The pre-migration phase, where they focused on "housekeeping" the platform and reducing unnecessary components, was crucial and helped reduce the scope of the migration by over 40%.
- Securing regulatory approval to move all data to the cloud was a significant effort, as BBVA was one of the first banks to do so.
- Designing the project with the goal of completely shutting down the old platform was an important consideration from the beginning.
Data Architecture
- BBVA used the AWS Modern Data Reference Architecture, combining the data mesh and data lakehouse approaches.
- Key components:
- Data ingestion using AWS services like DMS, DataSync, and EMR
- Data storage in S3, Redshift, and Glue Data Catalog
- Data processing using EMR and Spark
- Consumption layer with sandboxes, Athena, SageMaker, and BI tools
- Challenges included data catalog synchronization, parallel batch processing, and CI/CD integration.
Key Features
- Unified Console: BBVA built their own console to provide a streamlined experience for end users, including cost visibility and data subscription management.
- Data Subscriptions and Governance: The Glue Data Catalog and Lake Formation are used to manage data access and permissions, allowing users to self-serve while ensuring data security.
- Data Exfiltration Prevention: Amazon Upstream is used to prevent users from downloading data outside the platform.
- Parallel Migration: BBVA developed a custom solution using EMR and DistCP to migrate 4PB of data from on-premise to AWS while keeping both platforms running in parallel.
FinOps and Cost Management
- BBVA created a "FinDataOps" team to help with budgeting, cost visibility, and governance in the data realm.
- Preventive and detective guardrails are used to protect against unintended consumption and cost overruns, especially for the less technical end-user community.
- Custom cost dashboards are provided to give users visibility into their consumption and spending.
Future Roadmap
- Expanding to unstructured data management to support use cases for generative AI
- Improving data sharing and collaboration between sandboxes using AWS DataZone
- Integrating online inference with SageMaker and exploring new storage formats like Iceberg
- Adopting real-time data processing with Amazon Kinesis
- Leveraging newer AWS services like EMR Serverless and Glue Data Quality to improve developer productivity
Overall, BBVA's journey in building a global, multi-region data platform on AWS demonstrates the challenges and best practices in migrating a large-scale on-premise data infrastructure to the cloud while ensuring data security, cost control, and user productivity.