TalksAWS re:Invent 2025 - Delighting Slack users safely and quickly with Amazon Nova and Bedrock (AIM384)

AWS re:Invent 2025 - Delighting Slack users safely and quickly with Amazon Nova and Bedrock (AIM384)

Scaling Slack's Generative AI Capabilities with AWS Bedrock and Experimentation

Developing a Scalable and Secure Infrastructure

  • Slack's key priorities for their Slack AI features:
    • Trust: Ensure customer data is not used to train models and provide opt-out options
    • Security: Operate within FedRAMP Moderate compliance and maintain data security
    • Reliability: Ensure high availability and contextual relevance of AI responses
  • Challenges with initial SageMaker-based architecture:
    • Peaky traffic patterns with different latency requirements
    • GPU availability constraints leading to over-provisioning
    • Limited flexibility to experiment with new language models
  • Migration to AWS Bedrock:
    • Leveraged Bedrock's FedRAMP Moderate compliance and data isolation guarantees
    • Performed gradual migration with shadow traffic testing and staged cutover
    • Utilized Bedrock's on-demand pricing and cross-region inference to improve cost efficiency
    • Implemented backup models, emergency stops, and other operational improvements
  • Key benefits of the Bedrock migration:
    • Increased flexibility to experiment with 15+ language models in production
    • Improved reliability through model fallbacks and emergency response capabilities
    • Over 90% cost savings, equating to over $20 million annually

Developing an Experimentation Framework for Quality Assurance

  • Challenges in evaluating generative AI quality:
    • Subjective nature of outputs makes traditional metrics insufficient
    • Need to measure both objective and subjective quality dimensions
  • Slack's quality evaluation framework:
    • Objective quality: Rendering, formatting, parsing accuracy
    • Subjective quality: Factual accuracy, answer relevancy, attribution accuracy
    • Safety: Measuring for toxicity, bias, and security vulnerabilities
  • Experimentation workflow:
    • Offline testing on "golden" and "validation" datasets
    • Online A/B testing with automated quality and user satisfaction metrics
  • Example use case: Search query understanding optimization
    • Replaced high-cost, high-latency LLM with "Novalite" model
    • Achieved 46% latency reduction and 70% cost savings without quality regression
  • Overall impact:
    • 90% reduction in Slack AI cost per monthly active user
    • 5x increase in Slack AI scale while improving user satisfaction by 15-30%

Integrating Generative AI Across Slack's Product

  • Spectrum of generative AI complexity at Slack:
    • Low complexity: Classification, structured data conversion
    • Medium complexity: Summarization, basic content generation
    • High complexity: Agentic workflows, advanced content generation
  • Importance of selecting the right language model for each use case
  • Example: Search query understanding optimization
    • Leveraged Slack's infrastructure and experimentation framework
    • Replaced high-cost, high-latency LLM with "Novalite" model
    • Achieved 46% latency reduction and 70% cost savings without quality regression
  • Overall business impact:
    • 90% reduction in Slack AI cost per monthly active user
    • 5x increase in Slack AI scale while improving user satisfaction by 15-30%

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.