Refine RAG performance: Use Unstructured for enhanced data ingestion (AIM223)

Unstructured: Streamlining Data Transformation for Large Language Models

Introduction

  • The speaker is from a 2-year-old company called Unstructured, which has 10,000 paying customers and 18.5 million product downloads.
  • Unstructured has raised $65 million in venture capital and has a paid platform that launched this Monday.

Unstructured's Focus

  • Unstructured is focused on the problem of transforming unstructured data into a format that is consumable by large language models (LLMs).
  • LLMs often hallucinate and don't have access to all the data of an organization, so Unstructured aims to bring an organization's data to these models.

The Rise of Agents

  • The speaker has been hearing a lot about agents, with 50% of their calls being from people building agent-based workflows to replace various teams (product, HR, sales, marketing).
  • Unstructured is engaged with several agent companies and is looking at how to partner with them and supply data.

The Commoditization of Models

  • The speaker believes that the models are not boring, but rather the most beautiful piece of technology they've ever seen.
  • However, the models are becoming commoditized, and Unstructured has a large ML team that has spent the last two years figuring out the best way to transform various file types (PDFs, PowerPoint, Word) into a canonical JSON schema.
  • Unstructured uses a combination of its own models and the Clovis model (built by a major AI company) to achieve high-quality data transformation.

Unstructured's Approach

  • Unstructured focuses on transforming unstructured data and works with traditional vector storage or knowledge graphs.
  • They use object detection to identify different elements in a document (text, images, tables) and apply various transformation strategies to each element.
  • Unstructured aims to be the "home and hub" for orchestrating all the different models, embedding models, and transformation models available, allowing users to leverage the best tools for their specific data needs.

Unstructured's Platform

  • Unstructured has developed a platform that allows users to configure sources, destinations, and workflows for their data transformation needs.
  • The platform is designed to be fast, scalable, and cost-effective, with different transformation strategies (fast, high-resolution, and high-performance) to cater to various user requirements.
  • Unstructured manages all the third-party integrations, quotas, and billing, so users only need to pay Unstructured once.

Conclusion

  • Unstructured believes that the commoditization of models has made the data transformation process the key challenge, and they aim to be the solution that helps organizations streamline this process and get high-quality data into their LLM-based systems.
  • The company invites attendees to visit their booth (1895) to learn more about their platform and solutions.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us