Talks Power a cost-effective RAG solution using Amazon Titan Embeddings Text (AIM358) VIDEO
Power a cost-effective RAG solution using Amazon Titan Embeddings Text (AIM358) Here is a detailed summary of the video transcription in markdown format:
Introduction to Embeddings
Embeddings are numeric representations of text that encode semantic information
They can be used for improved search and retrieval, compared to simple lexical matching
Key concepts:
Tokens: Words or units of text
Chunk size: How the text is broken into smaller pieces for embedding
Overlap: The overlap between chunks
Embedding models: Used to convert text into numeric embeddings (e.g. Amazon Titan)
Scaling Embeddings
Scaling embeddings comes with challenges:
Cost of creating embeddings for documents and queries
Supporting multiple languages and data types (e.g. code)
Processing speed for large document collections
Storage cost for the embeddings
Titan Text Embeddings
Titan text embeddings address these challenges:
Low cost for embedding creation: $0.00002 per 1K tokens
Support for English, multilingual, and code
Two options for embedding speed: online inference or batch processing
Reducing storage cost through techniques:
Chopping: Reducing the number of dimensions in the embeddings
Rounding: Reducing precision by rounding to integers or using binary (0/1)
Combining chopping and rounding
Net DOS Success Story
Net DOS is a document management company with billions of documents
They explored using embeddings but found traditional float-based embeddings were prohibitively expensive to store
Adopted Titan binary embeddings, achieving:
90%+ reduction in storage requirements
86% reduction in hardware costs for hosting the vector database
Architecture Approaches
Basic approach:
Embed documents in S3, store embeddings in OpenSearch
Embed queries, search OpenSearch, fetch documents from S3
Hybrid approach:
Store both binary and float-point embeddings
Use binary for initial search, then re-rank using float-point
Advanced hybrid approach:
Store binary embeddings in OpenSearch
Store float-point embeddings and document passages in S3
Use binary for initial search, then re-rank using float-point from S3
The choice of approach depends on the tradeoffs between storage cost, search quality, and performance requirements for the specific use case.
Your Digital Journey deserves a great story. Build one with us.