Preparing for the new frontier: Accelerating AI with great data (AIM264)

Here is a detailed summary of the video transcription in markdown format:

Capital One's Tech Journey and Data Ecosystem

Capital One's Technology Transformation

Over the last decade, Capital One has rebuilt its technology stack from the ground up, using open source and building their own technology.
They went all-in on the cloud, adopting a serverless-first approach and becoming a highly cloud-fluent AWS customer.
This technology transformation has enabled Capital One to scale AI and ML to serve their 100 million customers.

The Importance of Data for AI

A nimble, flexible, and elastic tech stack, a well-managed and real-time data ecosystem, and talented personnel are key to enabling effective AI and ML.
There is a flywheel effect between data and AI - better data leads to better AI, and better AI leads to better data insights.

The Challenges of Data Complexity

The 3 V's of data complexity: Volume (147 zettabytes by 2025), Variety (80-90% of data is unstructured), and Velocity (real-time data access required in milliseconds).
Data quality and access issues are major impediments to effective AI - 64% of data professionals cite data quality as a top challenge, and 62% cite real-time data access as requiring the most attention.

Principles for Producing and Consuming Good Data

Self-service: Empowering the data community with tools, access, and discoverability.
Automation: Baking in data lineage, quality, SLAs, and governance into the data processes.
Scalable data: Avoiding point solutions and building for massive scale.

The Data Producer Experience

Onboarding Data

Register metadata, privacy/security settings, and SLAs.
Design and approve schema for structured data.
Provision data into the right stores and formats.

The Self-Service Portal and Control Plane

The self-service portal abstracts complexity, automating provisioning, data quality, transformations, and observability.
The control plane is a collection of services that configures the data pipeline and enforces governance.

Automating Data Onboarding at Scale

Central platform approach: Publishing data through an API that enforces governance.
Federated model: Instrumenting Spark pipelines with a purpose-built SDK to enforce governance.
The key is maintaining consistency in data governance and management across approaches.

The Data Consumer Experience

Capital One's Lake Strategy

Bring compute to the lake to minimize storage sprawl.
Adopt open table formats like Delta and Iceberg to enable SQL-like operations.
Implement a zone strategy for fit-for-purpose data access and management.

Lake Platform Capabilities

Provisioning service to manage data set locations and metadata.
Access management service for temporary, scoped data access.
Lifecycle policies and intelligent tiering for data management.
Cross-region replication for high availability.

Data Scientist and ML Engineer Experiences

Data scientists can self-provision spaces for model development and collaboration.
ML engineers can self-provision low-latency data stores (e.g., DynamoDB) for production model deployments.

Key Takeaways

Streamline experiences for data producers and consumers.
Build automation and scalable mechanisms for enforcement.
Enable rapid experimentation for data-driven innovation.
Ensure unwavering trustworthiness of the data ecosystem.

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Preparing for the new frontier: Accelerating AI with great data (AIM264)

Capital One's Tech Journey and Data Ecosystem

Capital One's Technology Transformation

The Importance of Data for AI

The Challenges of Data Complexity

Principles for Producing and Consuming Good Data

The Data Producer Experience

Onboarding Data

The Self-Service Portal and Control Plane

Automating Data Onboarding at Scale

The Data Consumer Experience

Capital One's Lake Strategy

Lake Platform Capabilities

Data Scientist and ML Engineer Experiences

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Preparing for the new frontier: Accelerating AI with great data (AIM264)

Capital One's Tech Journey and Data Ecosystem

Capital One's Technology Transformation

The Importance of Data for AI

The Challenges of Data Complexity

Principles for Producing and Consuming Good Data

The Data Producer Experience

Onboarding Data

The Self-Service Portal and Control Plane

Automating Data Onboarding at Scale

The Data Consumer Experience

Capital One's Lake Strategy

Lake Platform Capabilities

Data Scientist and ML Engineer Experiences

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.