TalksAWS re:Invent 2025 - 800GB of AI data: Lessons from a failed project (DEV345)

AWS re:Invent 2025 - 800GB of AI data: Lessons from a failed project (DEV345)

Summary of "AWS re:Invent 2025 - 800GB of AI data: Lessons from a failed project (DEV345)"

Overview

  • The presentation discusses a failed project by a solutions architect named Sandeep at Antstack, a company that builds serverless and AI applications.
  • The project involved analyzing a large dataset of over 800GB of game data from the online multiplayer game Valorant, with the goal of creating an optimal team of 5 players.

The Valorant Dataset

  • The dataset consisted of 7,500 game files spanning 3 different leagues and 3 years of gameplay.
  • Each game file was a large JSON document, ranging from 200MB to 800MB in size.
  • The dataset contained detailed information about every aspect of the game, including player actions, abilities used, kills, deaths, and more.

Initial Approach and Challenges

  • The team initially tried to process the data locally, but their laptops crashed due to the large file sizes.
  • They then used Amazon Lightsail to run the data processing scripts on a remote server, which allowed them to handle the 800GB dataset.
  • After decompressing the data, the team started analyzing the JSON files, identifying key metrics and characteristics for each player.

Vectorization and Vector Databases

  • The team attempted to use vector databases, such as Amazon Bedrock's knowledge base, to store and query the player data.
  • They experimented with different chunking strategies, including default chunking (splitting files into 300-token chunks) and no chunking (storing each player's data as a separate vector record).
  • However, the vector database approach had several issues:
    • The vectorization process removed important context and relationships between the data, making it difficult to query effectively.
    • The vector representations did not preserve the mathematical relationships between numerical values, leading to suboptimal results.

Leveraging Large Language Models (LLMs)

  • After the vector database approach failed, the team decided to leverage LLMs to help select the optimal team of 5 players.
  • They implemented a multi-stage approach using a step function to orchestrate different LLM-based tasks:
    • Selecting the best players for each map and character
    • Assigning players to roles and creating a game strategy
    • Caching responses in DynamoDB to support a chatbot interface
  • However, this approach also faced challenges, including:
    • Cognitive overload for the LLM due to the complexity of the prompts
    • Limitations on the size of data that could be returned from the Lambda functions
    • Frozen LLM executions due to timeouts

Key Lessons Learned

  • Don't teach an LLM what it already knows: The team found more success when they provided the LLM with the dataset schema and let it leverage its existing knowledge, rather than prescribing a specific approach.
  • Avoid cognitive overload: Splitting the problem into smaller, more manageable tasks and using a step function to orchestrate them proved more effective than asking the LLM to do too many things at once.
  • Consider the limitations of the underlying technologies: The team had to account for issues like data size limits and execution timeouts when integrating the LLM with other AWS services.

Conclusion

  • The project, which initially seemed like a simple weekend task, turned out to be a significant challenge for the team.
  • The lessons learned from this failed project highlight the importance of carefully designing and testing AI-powered solutions, especially when dealing with large and complex datasets.
  • By sharing their experience, the presenters hope to help other developers avoid similar pitfalls and better understand the practical considerations when applying LLMs and other AI technologies to real-world problems.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.