TalksAWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)

AWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)

AWS re:Invent 2025 - Data Protection Strategies for AI Data Foundation

Overview

This presentation from AWS re:Invent 2025 focuses on strategies for securing and protecting sensitive data used in AI and machine learning applications, particularly in the context of a nonprofit healthcare chatbot. The speakers, Derek Martinez and Sabrina Petruso, outline a comprehensive data protection framework and demonstrate the implementation of key security and privacy controls through a live coding example.

Key Challenges Addressed

  • Handling sensitive patient data in AI/ML applications
  • Mitigating risks of prompt injection attacks on language models
  • Ensuring data privacy and compliance (e.g. HIPAA) in AI data pipelines

Data Protection Framework

The presenters outline a 6-layer data protection framework:

1. Encryption

  • Encrypt data at rest and in transit using AWS KMS

2. Fine-Grained Access Control

  • Leverage IAM to implement least-privilege access controls

3. Auditing and Monitoring

  • Use AWS CloudTrail to log and audit all actions

4. Automated Compliance

  • Leverage AWS Config to define and monitor compliance rules (e.g. HIPAA)

5. PII Detection and Sanitization

  • Utilize Amazon Textract and Amazon Comprehend to detect and mask PII
  • Implement differential privacy techniques like k-anonymity and randomization

6. Prompt Injection Defense

  • Detect and mitigate potential prompt injection attacks on the chatbot

Live Coding Example

The presenters walk through a live coding example of the data protection pipeline built on Amazon SageMaker:

1. Data Ingestion

  • Internal data owner uploads raw patient data to an S3 bucket

2. Data Processing

  • Amazon Textract extracts text from documents
  • Amazon Comprehend detects and masks PII using differential privacy
  • Processed data is stored in a separate S3 bucket

3. Prompt Injection Defense

  • API Gateway exposes a backend Lambda function to process user prompts
  • Lambda function checks for and mitigates potential prompt injection attacks
  • AWS Config applies HIPAA-specific compliance rules

4. Auditing and Monitoring

  • All actions are logged to a secure S3 bucket via AWS CloudTrail
  • Audit logs track PII detection, masking, and prompt injection events

Business Impact and Use Cases

  • Enables secure and compliant use of sensitive data in AI/ML applications
  • Protects against data breaches and privacy violations
  • Allows organizations to leverage AI while maintaining data governance
  • Particularly relevant for industries like healthcare, finance, and government

Key Takeaways

  • Comprehensive data protection is crucial for responsible AI development
  • Layered security approach with encryption, access control, auditing, and compliance is essential
  • Differential privacy techniques like k-anonymity can effectively mask PII
  • Prompt injection is a significant risk for language model-based applications
  • The presented framework and code can be adapted for various AI use cases

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.