TalksAWS re:Invent 2025 - Build gpu-boosted, auto-optimized, billion-scale VectorDBs in hours (ANT213)

AWS re:Invent 2025 - Build gpu-boosted, auto-optimized, billion-scale VectorDBs in hours (ANT213)

Building Billion-Scale Vector Databases with GPU Acceleration and Auto-Optimization

Introduction to Open Search and Vector Search

  • Open Search is a search and analytics engine that is part of the Linux Foundation
  • It provides both managed clusters and serverless versions for customers
  • Open Search has a long history in vector search, dating back to 2020
  • Vector search is used to improve search quality, enable semantic search, and power diverse applications like recommendations and anomaly detection
  • Open Search has integrated with vector search libraries like Facebook's FAISS to enable scalable vector search

Scaling Vector Databases: Challenges and Limitations

  • Building large-scale vector databases (e.g. billions of vectors) is challenging due to the resource-intensive nature of the indexing process
  • Indexing a billion vectors can take days on a CPU-based cluster, impacting productivity and innovation velocity
  • The indexes also need to be rebuilt frequently as data and models change, further compounding the operational burden
  • Scaling vector search while maintaining low latency is difficult, as the indexing and search workloads compete for resources on the same infrastructure

GPU Acceleration for Vector Indexing

  • Open Search collaborated with NVIDIA to leverage GPU-accelerated vector indexing algorithms like Faiss Kagura
  • Benchmarks showed 6-14x speedups in index build times and 6-12x cost savings when using GPU instances
  • The GPU acceleration was able to maintain low search latencies even under high concurrent indexing workloads

Auto-Optimization for Vector Indexes

  • Optimizing vector index configurations is critical for balancing search quality, cost, and performance
  • Open Search's auto-optimization framework automates the process of testing different algorithms, compression techniques, and index modes
  • This reduces the expertise and time required to find the optimal index configuration, from days to under an hour
  • The framework provides detailed recommendations and reports on the trade-offs between recall, latency, and cost

Serverless GPU-Accelerated Vector Indexing

  • Open Search provides a "serverless GPU" capability that automatically scales GPU resources to accelerate vector indexing when needed
  • Customers can enable this feature through a simple API/CLI switch, without having to manage the GPU infrastructure
  • This allows customers to benefit from GPU acceleration without the overhead of provisioning and managing GPU instances

Business Impact and Use Cases

  • Large-scale customers like Amazon Brand Protection are using Open Search to index billions of vectors for automated anomaly detection
  • Startups like DevRev AI are building "agent-based" applications on top of Open Search's vector search capabilities, achieving 88% ticket resolution without human intervention
  • The combination of GPU acceleration and auto-optimization enables customers to build and maintain responsive, high-quality vector search applications at scale

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.