TalksAWS re:Invent 2025 - Autodesk's ML Inference Optimization: Leveraging AWS AI Chips (SPS201)

AWS re:Invent 2025 - Autodesk's ML Inference Optimization: Leveraging AWS AI Chips (SPS201)

Summary of AWS re:Invent 2025 - Autodesk's ML Inference Optimization: Leveraging AWS AI Chips (SPS201)

The Future of AI Compute

  • According to Gartner, the spend on AI infrastructure is expected to double from $18 billion this year to $36 billion next year.
  • Customers are looking for more flexibility in their AI infrastructure, rather than just throwing more GPUs at larger models.
  • Performance gains start to diminish after a certain point, so customers are focusing on smaller, fine-tuned models for their specific needs.
  • AWS provides a range of AI-optimized instances, including Graviton CPUs, GPUs, and their own AWS AI accelerators like Trainium and Neuron.

Autodesk's AI Journey

  • Autodesk's mission is to empower innovators to design and create anything, from buildings to VFX.
  • Autodesk's generative AI models are used across architecture, construction, entertainment, and manufacturing, leading to a 200% increase in their deep learning compute needs over the past two years.
  • Autodesk's deep learning workflow involves data ingestion, processing, model training, and deployment on an AWS-based infrastructure using EKS, Ray, and observability tools.
  • Autodesk has been at the forefront of 3D generative model research, publishing in top conferences like CVPR and ICML.

Optimizing Autodesk Bernini on AWS Neuron

  • Autodesk's Bernini is a 3D generative model that can create 3D objects from various inputs like images, point clouds, and natural language.
  • The Bernini pipeline involves context encoding, reverse diffusion, wavelet transformation, and mesh generation.
  • Autodesk collaborated with AWS to optimize the inference of the Bernini model on the AWS Neuron AI accelerator chip.

Understanding AWS Neuron

  • AI accelerators like Neuron are needed to efficiently handle the massive matrix multiplication workloads of deep learning models.
  • Neuron uses a systolic array architecture to enable cheaper, simpler, and more power-efficient hardware compared to general-purpose CPUs or GPUs.
  • Neuron has specialized engines for different types of operations, including tensor, vector, scalar, and general-purpose engines, providing flexibility and performance.
  • Neuron integrates with PyTorch through the new Torch Neuron SDK, simplifying the developer experience.

Deploying Bernini on AWS Neuron

  • Autodesk broke down the Bernini pipeline into individual PyTorch models and compiled them separately using the Neuron Trace API.
  • The compiled model artifacts were then deployed on Autodesk's Kubernetes-based infrastructure using Rayerf and Argo CD.
  • Compared to running Bernini on other instances, Neuron provided up to 28% cost savings per 1 million inferences, with better latency and throughput.

Key Lessons and Takeaways

  • There is no one-size-fits-all solution for AI hardware - different models and workloads may perform better on CPUs, GPUs, or specialized accelerators like Neuron.
  • Model architecture and production settings (batch size, sequence length, etc.) have a significant impact on performance and cost-efficiency.
  • Flexibility and the ability to choose the right hardware for the right workload are crucial for long-term AI infrastructure success.
  • Specialized hardware like Neuron will continue to play an important role in scaling AI compute efficiency as Moore's Law slows down.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.