Power a cost-effective RAG solution using Amazon Titan Embeddings Text (AIM358)

Here is a detailed summary of the video transcription in markdown format:

Introduction to Embeddings

  • Embeddings are numeric representations of text that encode semantic information
  • They can be used for improved search and retrieval, compared to simple lexical matching
  • Key concepts:
    • Tokens: Words or units of text
    • Chunk size: How the text is broken into smaller pieces for embedding
    • Overlap: The overlap between chunks
    • Embedding models: Used to convert text into numeric embeddings (e.g. Amazon Titan)

Scaling Embeddings

  • Scaling embeddings comes with challenges:
    1. Cost of creating embeddings for documents and queries
    2. Supporting multiple languages and data types (e.g. code)
    3. Processing speed for large document collections
    4. Storage cost for the embeddings

Titan Text Embeddings

  • Titan text embeddings address these challenges:
    • Low cost for embedding creation: $0.00002 per 1K tokens
    • Support for English, multilingual, and code
    • Two options for embedding speed: online inference or batch processing
  • Reducing storage cost through techniques:
    • Chopping: Reducing the number of dimensions in the embeddings
    • Rounding: Reducing precision by rounding to integers or using binary (0/1)
    • Combining chopping and rounding

Net DOS Success Story

  • Net DOS is a document management company with billions of documents
  • They explored using embeddings but found traditional float-based embeddings were prohibitively expensive to store
  • Adopted Titan binary embeddings, achieving:
    • 90%+ reduction in storage requirements
    • 86% reduction in hardware costs for hosting the vector database

Architecture Approaches

  1. Basic approach:
    • Embed documents in S3, store embeddings in OpenSearch
    • Embed queries, search OpenSearch, fetch documents from S3
  2. Hybrid approach:
    • Store both binary and float-point embeddings
    • Use binary for initial search, then re-rank using float-point
  3. Advanced hybrid approach:
    • Store binary embeddings in OpenSearch
    • Store float-point embeddings and document passages in S3
    • Use binary for initial search, then re-rank using float-point from S3

The choice of approach depends on the tradeoffs between storage cost, search quality, and performance requirements for the specific use case.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us