TalksAWS re:Invent 2025 -Accelerating Frontier AI w/ Foundational Platform Architecture Elements (AIM282)

AWS re:Invent 2025 -Accelerating Frontier AI w/ Foundational Platform Architecture Elements (AIM282)

Accelerating Frontier AI at Capital One

AI Transformation at Capital One

  • Capital One is a 30-year-old Fortune 100 company with a strong focus on technology and data-driven innovation.
  • The company has been on a 10-year journey of AI transformation, driven by a forward-leaning CEO and a relentless focus on talent development.
  • Capital One has been recognized as a leader in AI maturity, talent, and innovation, ranking #1 in the Evident AI index for the last 3 years and top 10 globally in generative and agentic patents.
  • The company's AI strategy is built on two key pillars: 1) Developing custom AI models and 2) Building an enterprise-grade AI platform.

Enterprise AI Platform Architecture

  • Capital One's AI platform leverages a combination of AWS services, open-source projects, and custom proprietary capabilities to enable rapid deployment of AI at scale.
  • The platform consists of two main components:

Model Training Infrastructure

  • Utilizes a hybrid approach of AWS services, open-source tools, and custom pipelines to build a high-performance, multi-tenant training infrastructure.
  • Key features include high-speed networking, scheduling and queuing, high-performance file systems, and GPU node recovery.
  • The training infrastructure has evolved over three phases, with a focus on minimizing downtime, optimizing GPU utilization, and supporting a diverse set of users and workloads.

Inference Platform

  • Designed to optimize for cost, latency, throughput, and reliability at scale, going beyond managed service providers.
  • Combines a hybrid approach of managed services and in-house infrastructure to balance speed, cost, and control.
  • Implemented various optimizations, such as model-level and infrastructure-level optimizations, to achieve order-of-magnitude improvements in performance and cost.

Case Study: Chat Concierge

  • Chat Concierge is Capital One's first customer-facing, multi-agentic AI application, launched in 2025.
  • It leverages a custom agentic framework called Macau, which includes an understanding agent, a planner agent, an evaluator agent, and an explainer agent.
  • The application is designed to provide a natural language-based car shopping experience for customers on dealer websites, with real-time access to dealer inventory and scheduling capabilities.
  • The initial deployment of Chat Concierge faced challenges with latency and throughput, which were addressed through the platform's inference optimizations.

Agentic Coding Tools Integration

  • Capital One is actively deploying agentic coding tools, such as Anthropic's Clot Code, within its software development lifecycle.
  • To address the inherent risks of integrating these tools in a regulated environment, the company has built a "zero-trust AI environment" using AWS Bedrock and a custom AI gateway.
  • The gateway implements controls like token rate limiting, observability, and sandboxing to manage the risks while enabling developers to leverage the benefits of agentic coding tools.

Talent and Partnerships

  • Attracting and developing top talent is a key differentiator for Capital One's AI efforts.
  • The company has been ranked #1 in AI talent for the last 3 years and has established research partnerships with leading universities to build a strong pipeline of AI talent.
  • These partnerships include research centers, summer internships, and an AI engineering internship program, allowing the company to leverage academic expertise and provide real-world experience to students.

Conclusion

  • Capital One's approach to enterprise AI is centered around building a robust, customizable, and well-managed platform that can keep pace with the rapid advancements in AI technology.
  • By leveraging a hybrid model of AWS services, open-source tools, and custom capabilities, the company has been able to innovate at the frontier of AI while maintaining the necessary controls and governance for a regulated industry.
  • The company's focus on talent development and strategic partnerships further strengthens its position as a leader in enterprise AI transformation.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.