TalksBBVA: Building a multi-region, multi-country data platform at scale (FSI310)
BBVA: Building a multi-region, multi-country data platform at scale (FSI310)
BBVA: Building a Multi-Region, Multi-Country Data Platform at Scale
Introduction
BBVA, a global financial institution with presence in over 25 countries and 70 million customers, announced a strategic partnership with AWS in June 2023 to become a data and AI-driven organization.
The goal was to leverage AWS analytics and AI services to create a new global data platform, called ADA, that would be deployed globally at scale.
The new data platform would provide internal business stakeholders with automated business and marketing insights, increasing operational efficiencies and attracting new customers.
The global data platform was delivered on time and on budget, and is now live.
Challenges and Lessons Learned
The complexity of the project was not in building the new platform, but in migrating everything from on-premise to the new platform while keeping both platforms running in parallel.
The parallel phase was one of the most important and challenging parts of the project, as they had to run over 50,000 processes on both platforms simultaneously.
The pre-migration phase, where they focused on "housekeeping" the platform and reducing unnecessary components, was crucial and helped reduce the scope of the migration by over 40%.
Securing regulatory approval to move all data to the cloud was a significant effort, as BBVA was one of the first banks to do so.
Designing the project with the goal of completely shutting down the old platform was an important consideration from the beginning.
Data Architecture
BBVA used the AWS Modern Data Reference Architecture, combining the data mesh and data lakehouse approaches.
Key components:
Data ingestion using AWS services like DMS, DataSync, and EMR
Data storage in S3, Redshift, and Glue Data Catalog
Data processing using EMR and Spark
Consumption layer with sandboxes, Athena, SageMaker, and BI tools
Challenges included data catalog synchronization, parallel batch processing, and CI/CD integration.
Key Features
Unified Console: BBVA built their own console to provide a streamlined experience for end users, including cost visibility and data subscription management.
Data Subscriptions and Governance: The Glue Data Catalog and Lake Formation are used to manage data access and permissions, allowing users to self-serve while ensuring data security.
Data Exfiltration Prevention: Amazon Upstream is used to prevent users from downloading data outside the platform.
Parallel Migration: BBVA developed a custom solution using EMR and DistCP to migrate 4PB of data from on-premise to AWS while keeping both platforms running in parallel.
FinOps and Cost Management
BBVA created a "FinDataOps" team to help with budgeting, cost visibility, and governance in the data realm.
Preventive and detective guardrails are used to protect against unintended consumption and cost overruns, especially for the less technical end-user community.
Custom cost dashboards are provided to give users visibility into their consumption and spending.
Future Roadmap
Expanding to unstructured data management to support use cases for generative AI
Improving data sharing and collaboration between sandboxes using AWS DataZone
Integrating online inference with SageMaker and exploring new storage formats like Iceberg
Adopting real-time data processing with Amazon Kinesis
Leveraging newer AWS services like EMR Serverless and Glue Data Quality to improve developer productivity
Overall, BBVA's journey in building a global, multi-region data platform on AWS demonstrates the challenges and best practices in migrating a large-scale on-premise data infrastructure to the cloud while ensuring data security, cost control, and user productivity.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.