Build multimodal ASL avatars with bidirectional translation (DEV306)

Here is a detailed summary of the video transcription in markdown format, broken into sections for better readability:

Introduction

  • The session discusses how to build multimodal American Sign Language (ASL) avatars with bi-directional translation capabilities.
  • The presenters are Alak Eswaradass (Principal Solutions Architect at AWS), Suresh Poopandi (Principal Solutions Architect at AWS), and Rob Koch (Principal Data Engineer at Slalom).
  • They aim to address the challenges faced by Deaf and hard of hearing users when accessing information and communicating with hearing people.

Sign Languages and Challenges

  • Sign languages, such as American Sign Language (ASL), are the primary language for Deaf and hard of hearing users.
  • Sign languages use hand gestures, body movements, and facial expressions to communicate.
  • Relying solely on captions or subtitles can be limiting as they may not capture the nuances and emotional aspects of the conversation.
  • There is a global shortage of sign language interpreters, which creates accessibility challenges.

Sign Language Avatars

  • Sign language avatars are AI-powered digital agents that can engage in conversations and provide sign language interpretation.
  • Two main use cases for sign language avatars:
    1. Narrative avatars: Translate audio/video content into sign language in real-time.
    2. Conversational avatars: Facilitate conversations between Deaf/hard of hearing users and hearing users.
  • Customization and inclusive communication are important aspects of the sign language avatars.

Technical Solution: GenASL

  • GenASL is a generative AI-powered application that enables visual communication for individuals who rely on it.
  • It has two main flows:
    1. Sign language video generation: Converts English audio to ASL avatar video.
    2. Video detection: Converts ASL video to English text and audio.
  • The solution leverages various multimodal AI models and services, such as Amazon Transcribe, Anthropic's Claude 3.5 Sonnet, and Meta's Llama 3.2 Vision Instruct.
  • The architecture follows a decoupled approach, allowing for customization and integration of different foundational models.

Use Cases and Customization

  • The sign language avatars can be applied in various industry sectors, such as healthcare, finance, media, and education.
  • Common approaches for customizing foundational models include prompt engineering, retrieval-augmented generation, fine-tuning, and continued pre-training.
  • The GenASL solution has leveraged fine-tuning techniques to adapt the models to the specific use cases.

Demonstrations

  • The presenters showcase four demo scenarios:
    1. Narrative avatar: Translating a training video from audio to ASL avatar.
    2. Narrative avatar: Translating a presentation from audio to ASL avatar.
    3. Conversational avatar: Facilitating a check-in conversation at a hotel.
    4. Conversational avatar: Assisting a customer in finding coffee machines at a retail store.

Architecture and Implementation

  • The solution's architecture is divided into two main parts:
    1. ASL video generation flow:
      • Converts English audio to text, then to ASL Gloss, and finally to smooth ASL avatar video.
    2. Video detection flow:
      • Converts ASL video to English text and then to English audio.
  • The presenters discuss the use of various models and services, such as Amazon Transcribe, Anthropic's Claude 3.5 Sonnet, Stable Diffusion, and Meta's Llama 3.2 Vision Instruct.
  • They also share best practices for integrating the solution, such as considerations for live streaming, leveraging Amplify Gen2, and utilizing Bedrock features.

Future Developments and Conclusion

  • Plans for future development include adding multilingual support and exploring the potential of unified multimodal models.
  • The presenters encourage attendees to explore the existing resources, such as the previous year's AWS ML blog and the upcoming Chalk Talk session, to learn more about the solution.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us