TalksGenerative AI meets multi-tenancy: Inside a working solution (SAS407)

Generative AI meets multi-tenancy: Inside a working solution (SAS407)

The video transcript covers a detailed discussion on building various multi-tenant architectures for SAS (Software as a Service) solutions, with a focus on resolving common SAS architecture challenges. The key takeaways from the transcript are:

SAS Architecture Challenges

Scaling Language Models: Serving a single customer with a large language model (LLM) can be complex, and scaling it to thousands of tenants introduces even more challenges, such as managing multiple LLMs, customer data isolation, and cost optimization.

Two Popular Gen Architectures

Retrieval Augmented Generation (RAG): Leveraging a generic LLM and embedding customer data into a vector store for retrieval and augmentation of the LLM response.
Fine-tuning: Fine-tuning the LLM with customer data to have the knowledge already embedded, reducing the need for context in each request.

AWS Services for Multi-Tenant SAS

Amazon Bedrock Knowledge Base: A managed RAG service that abstracts the complexity of connecting data, vector store, and LLM.
Amazon Bedrock Customized Model: A feature to fine-tune LLMs for each tenant without the need to host and manage the models.

Basic Tier vs. Premium Tier Architectures

Basic Tier: Focuses on using shared services (pool pattern) and a RAG approach to optimize costs.
Premium Tier: Utilizes dedicated resources (silo pattern) and a combination of RAG and fine-tuning to provide the best user experience.

Key Architecture Challenges and Solutions

Tenant Isolation: Leveraging IAM roles, security token service, and data access policies to ensure each tenant can only access its own resources.
Cost per Tenant: Capturing metrics like input/output tokens with tenant context, aggregating them, and multiplying with the total service cost to derive the cost per tenant.
Noise Eater: Implementing tenant-specific token usage plans and real-time token usage tracking to provide a throttling experience at the API level.

The transcript also includes detailed code examples and explanations of how to implement these concepts, as well as links to GitHub repositories and a workshop that covers these topics in depth.

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Generative AI meets multi-tenancy: Inside a working solution (SAS407)

SAS Architecture Challenges

Two Popular Gen Architectures

AWS Services for Multi-Tenant SAS

Basic Tier vs. Premium Tier Architectures

Key Architecture Challenges and Solutions

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Generative AI meets multi-tenancy: Inside a working solution (SAS407)

SAS Architecture Challenges

Two Popular Gen Architectures

AWS Services for Multi-Tenant SAS

Basic Tier vs. Premium Tier Architectures

Key Architecture Challenges and Solutions

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.