TalksAWS re:Invent 2025 - Own Your AI – Blazing Fast OSS AI on AWS (STP104)

AWS re:Invent 2025 - Own Your AI – Blazing Fast OSS AI on AWS (STP104)

AWS re:Invent 2025 - Own Your AI – Blazing Fast OSS AI on AWS (STP104)

Overview

  • The presentation focuses on Fireworks AI, a platform that enables companies to build and deploy high-performance, customized AI models using open-source technologies.
  • The speaker highlights the challenges of building production-ready AI agents and how Fireworks aims to address these challenges.

The Rise of AI Agents

  • The speaker predicts that the next decade will be the "year of agents" - AI-powered applications that can automate various business tasks.
  • Fireworks has helped numerous companies across industries build and deploy AI agents for use cases like coding, document processing, sales and marketing, hiring, and customer service.

Challenges of Building AI Agents

  • Building AI agents for specific business use cases can be challenging due to various factors:
    • Choice of open-source vs. closed-source models
    • Achieving low latency (e.g., sub-300ms) at scale
    • Ensuring high accuracy and quality
    • Managing the cost of deployment
    • Dealing with infrastructure complexity (e.g., running large models on GPU clusters)

The Fireworks AI Platform

  • Fireworks is an open-source inference and customization engine that aims to simplify the process of building and deploying AI agents.
  • Key features:
    • Easy integration with popular open-source models (e.g., Deepseek, Kimmy, Llama, Mistral, Quen)
    • Workload optimization using the "Fire Optimizer" to balance latency, cost, and quality
    • Fine-tuning capabilities, including both supervised and reinforcement learning approaches
    • Scalable and reliable infrastructure built on AWS services (EC2, ECS, EKS)
    • Flexible deployment options, from SaaS to fully air-gapped environments

Workload Optimization

  • Fireworks' "Fire Optimizer" analyzes a large parameter space (around 84,000 parameters) to determine the optimal configuration for a given workload, including model selection, hardware, execution modes, and kernel options.
  • This allows Fireworks to support a wide range of use cases, from low-latency search to complex, high-parameter "agentic AI" workloads.
  • Techniques like speculative decoding are used to achieve sub-100ms latency for certain use cases.

Fine-Tuning and Customization

  • Fireworks emphasizes the importance of companies owning their AI models and IP, rather than relying on generic, closed-source models.
  • The platform provides easy-to-use fine-tuning capabilities, both supervised and reinforcement learning, to allow companies to customize models with their own data and expertise.
  • This approach has been shown to outperform closed-source providers in terms of quality, latency, and cost for use cases like text-to-SQL and product catalog cleansing.

Real-World Examples

  • Grocery delivery platform: Used Fireworks to fine-tune a small LLM model for fast, low-latency search, reducing support tickets by 50% and improving user experience.
  • Notion: Moved from a closed-source provider to Fireworks, achieving 4x lower latency (under 500ms) while scaling to over 100 million users.
  • DoorDash: Leveraged Fireworks' vision-language models (VLMs) to fine-tune a product catalog cleansing model, running 3x faster than the previous closed-source solution with 10% cost savings.

Conclusion

  • Fireworks aims to be a one-stop shop for building "magical AI applications" that match or exceed the quality of closed-source providers while maintaining low latency and cost.
  • The platform's focus on open-source models, workload optimization, and customization through fine-tuning allows companies to own their AI and leverage their unique data and expertise.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.