Talks AWS re:Invent 2025 - Delighting Slack users safely and quickly with Amazon Nova and Bedrock (AIM384) VIDEO
AWS re:Invent 2025 - Delighting Slack users safely and quickly with Amazon Nova and Bedrock (AIM384) Scaling Slack's Generative AI Capabilities with AWS Bedrock and Experimentation
Developing a Scalable and Secure Infrastructure
Slack's key priorities for their Slack AI features:
Trust : Ensure customer data is not used to train models and provide opt-out options
Security : Operate within FedRAMP Moderate compliance and maintain data security
Reliability : Ensure high availability and contextual relevance of AI responses
Challenges with initial SageMaker-based architecture:
Peaky traffic patterns with different latency requirements
GPU availability constraints leading to over-provisioning
Limited flexibility to experiment with new language models
Migration to AWS Bedrock:
Leveraged Bedrock's FedRAMP Moderate compliance and data isolation guarantees
Performed gradual migration with shadow traffic testing and staged cutover
Utilized Bedrock's on-demand pricing and cross-region inference to improve cost efficiency
Implemented backup models, emergency stops, and other operational improvements
Key benefits of the Bedrock migration:
Increased flexibility to experiment with 15+ language models in production
Improved reliability through model fallbacks and emergency response capabilities
Over 90% cost savings, equating to over $20 million annually
Developing an Experimentation Framework for Quality Assurance
Challenges in evaluating generative AI quality:
Subjective nature of outputs makes traditional metrics insufficient
Need to measure both objective and subjective quality dimensions
Slack's quality evaluation framework:
Objective quality: Rendering, formatting, parsing accuracy
Subjective quality: Factual accuracy, answer relevancy, attribution accuracy
Safety: Measuring for toxicity, bias, and security vulnerabilities
Experimentation workflow:
Offline testing on "golden" and "validation" datasets
Online A/B testing with automated quality and user satisfaction metrics
Example use case: Search query understanding optimization
Replaced high-cost, high-latency LLM with "Novalite" model
Achieved 46% latency reduction and 70% cost savings without quality regression
Overall impact:
90% reduction in Slack AI cost per monthly active user
5x increase in Slack AI scale while improving user satisfaction by 15-30%
Integrating Generative AI Across Slack's Product
Spectrum of generative AI complexity at Slack:
Low complexity: Classification, structured data conversion
Medium complexity: Summarization, basic content generation
High complexity: Agentic workflows, advanced content generation
Importance of selecting the right language model for each use case
Example: Search query understanding optimization
Leveraged Slack's infrastructure and experimentation framework
Replaced high-cost, high-latency LLM with "Novalite" model
Achieved 46% latency reduction and 70% cost savings without quality regression
Overall business impact:
90% reduction in Slack AI cost per monthly active user
5x increase in Slack AI scale while improving user satisfaction by 15-30%
Your Digital Journey deserves a great story. Build one with us.