Power a cost-effective RAG solution using Amazon Titan Embeddings Text (AIM358)

Here is a detailed summary of the video transcription in markdown format:

Introduction to Embeddings

Embeddings are numeric representations of text that encode semantic information
They can be used for improved search and retrieval, compared to simple lexical matching
Key concepts:
- Tokens: Words or units of text
- Chunk size: How the text is broken into smaller pieces for embedding
- Overlap: The overlap between chunks
- Embedding models: Used to convert text into numeric embeddings (e.g. Amazon Titan)

Scaling Embeddings

Scaling embeddings comes with challenges:
1. Cost of creating embeddings for documents and queries
2. Supporting multiple languages and data types (e.g. code)
3. Processing speed for large document collections
4. Storage cost for the embeddings

Titan Text Embeddings

Titan text embeddings address these challenges:
- Low cost for embedding creation: $0.00002 per 1K tokens
- Support for English, multilingual, and code
- Two options for embedding speed: online inference or batch processing
Reducing storage cost through techniques:
- Chopping: Reducing the number of dimensions in the embeddings
- Rounding: Reducing precision by rounding to integers or using binary (0/1)
- Combining chopping and rounding

Net DOS Success Story

Net DOS is a document management company with billions of documents
They explored using embeddings but found traditional float-based embeddings were prohibitively expensive to store
Adopted Titan binary embeddings, achieving:
- 90%+ reduction in storage requirements
- 86% reduction in hardware costs for hosting the vector database

Architecture Approaches

Basic approach:
- Embed documents in S3, store embeddings in OpenSearch
- Embed queries, search OpenSearch, fetch documents from S3
Hybrid approach:
- Store both binary and float-point embeddings
- Use binary for initial search, then re-rank using float-point
Advanced hybrid approach:
- Store binary embeddings in OpenSearch
- Store float-point embeddings and document passages in S3
- Use binary for initial search, then re-rank using float-point from S3

The choice of approach depends on the tradeoffs between storage cost, search quality, and performance requirements for the specific use case.

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Power a cost-effective RAG solution using Amazon Titan Embeddings Text (AIM358)

Introduction to Embeddings

Scaling Embeddings

Titan Text Embeddings

Net DOS Success Story

Architecture Approaches

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Power a cost-effective RAG solution using Amazon Titan Embeddings Text (AIM358)

Introduction to Embeddings

Scaling Embeddings

Titan Text Embeddings

Net DOS Success Story

Architecture Approaches

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.