TalksAWS re:Invent 2025 - How Spice AI operationalizes data lakes for AI using Amazon S3 (STG364)

AWS re:Invent 2025 - How Spice AI operationalizes data lakes for AI using Amazon S3 (STG364)

Operationalizing Data Lakes for AI with Amazon S3 and Spice AI

Overview

This presentation from AWS re:Invent 2025 showcases how Spice AI, a day one launch partner for Amazon S3 Vectors, helps organizations operationalize their data lakes for AI workloads using Amazon S3 and related services.

Key Challenges in Adopting AI

The presenters highlight several key challenges organizations face when trying to leverage their data lakes for AI:

  • Data Fragmentation and Silos: Data required for AI workloads is often spread across multiple purpose-built systems, making it difficult to integrate.
  • Privacy and Security Governance: Extending data governance and security policies across disparate data sources is complex.
  • Cost-Effective Scaling: Scaling data storage and processing for AI can be expensive.
  • Retrieval Accuracy: Ensuring high-quality search and retrieval of relevant data is crucial for building effective AI agents and workflows.
  • Integration Complexity: Integrating the various data sources, storage formats, and processing requirements is challenging.
  • Observability and Logging: Tracking and observing the end-to-end data and AI pipeline is important for monitoring and troubleshooting.

Leveraging Amazon S3 and S3 Vectors

The presenters explain how Amazon S3 has become the foundation for many organizations' data lakes, providing the scale, durability, and cost-efficiency required. However, to fully leverage these data lakes for AI workloads, additional capabilities are needed:

  • S3 Tables: Allows bringing tabular data stored in formats like Apache Iceberg directly into the data lake, enabling queries on structured data.
  • S3 Vectors: Provides a way to store and query vector data, which is the "language of AI" for use cases like retrieval, recommendation, and natural language processing.

Spice AI's Role in Operationalizing Data Lakes

Spice AI, as a key partner for AWS, helps organizations overcome the challenges of operationalizing data lakes for AI by:

  1. Ingestion and Integration:

    • Provides a seamless way to ingest data into Amazon S3, S3 Tables, and S3 Vectors.
    • Federates data from multiple sources, including databases, data warehouses, and other systems.
  2. Indexing and Caching:

    • Automatically indexes and caches data in S3 Vectors and S3 Tables for fast retrieval and querying.
    • Handles complex tasks like index partitioning, sharding, and metadata management.
  3. Hybrid Search and Retrieval:

    • Combines full-text search (BM25) with vector-based semantic search on the data.
    • Provides a single, high-quality ranked result set by reranking the search results.
  4. AI Integration:

    • Seamlessly integrates the data lake with AI/ML models, such as the Titan embedding model, for higher-level analysis and inference.
    • Allows executing custom AI/ML functions directly on the data lake.

Demonstration and Technical Details

The presenters walk through a live demonstration of the Spice AI platform, showcasing how it can be used to:

  1. Ingest Real-Time Data: Ingests questions and answers from a Kafka stream into the data lake.
  2. Query Historical Data: Queries a large, 250,000-record dataset stored in S3 Tables and S3 Vectors.
  3. Perform Hybrid Search: Combines full-text search and vector-based semantic search to provide high-quality, relevant results.
  4. Integrate with AI/ML: Feeds the search results into an AI model (Nova) to extract relevant technology keywords.

The demonstration highlights how Spice AI abstracts away the complex distributed systems challenges of working with Amazon S3, S3 Tables, and S3 Vectors, allowing developers to focus on building AI-powered applications with just a few lines of YAML configuration.

Business Impact and Use Cases

By operationalizing data lakes for AI using Amazon S3 and Spice AI, organizations can:

  • Accelerate Time-to-Value: Quickly build and deploy AI-powered applications without the overhead of managing complex data infrastructure.
  • Improve Search and Retrieval: Provide high-quality, semantically relevant search results to power AI agents and workflows.
  • Leverage Existing Investments: Capitalize on their existing investments in Amazon S3 and extend the value of their data lakes for AI.
  • Expand to New AI Use Cases: Easily integrate data lakes with AI/ML models to enable a wide range of AI-powered applications, such as security analysis, fraud detection, and sentiment analysis.

Conclusion and Call to Action

The presenters encourage the audience to try Spice AI, which is available as an open-source project, and to visit them at the event to learn more and receive t-shirts and other swag.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.