TalksAWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)

AWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)

AWS re:Invent 2025 - Data Protection Strategies for AI Data Foundation

Overview

This presentation from AWS re:Invent 2025 focuses on strategies for securing and protecting sensitive data used in AI and machine learning applications, particularly in the context of a nonprofit healthcare chatbot. The speakers, Derek Martinez and Sabrina Petruso, outline a comprehensive data protection framework and demonstrate the implementation of key security and privacy controls through a live coding example.

Key Challenges Addressed

Handling sensitive patient data in AI/ML applications
Mitigating risks of prompt injection attacks on language models
Ensuring data privacy and compliance (e.g. HIPAA) in AI data pipelines

Data Protection Framework

The presenters outline a 6-layer data protection framework:

1. Encryption

Encrypt data at rest and in transit using AWS KMS

2. Fine-Grained Access Control

Leverage IAM to implement least-privilege access controls

3. Auditing and Monitoring

Use AWS CloudTrail to log and audit all actions

4. Automated Compliance

Leverage AWS Config to define and monitor compliance rules (e.g. HIPAA)

5. PII Detection and Sanitization

Utilize Amazon Textract and Amazon Comprehend to detect and mask PII
Implement differential privacy techniques like k-anonymity and randomization

6. Prompt Injection Defense

Detect and mitigate potential prompt injection attacks on the chatbot

Live Coding Example

The presenters walk through a live coding example of the data protection pipeline built on Amazon SageMaker:

1. Data Ingestion

Internal data owner uploads raw patient data to an S3 bucket

2. Data Processing

Amazon Textract extracts text from documents
Amazon Comprehend detects and masks PII using differential privacy
Processed data is stored in a separate S3 bucket

3. Prompt Injection Defense

API Gateway exposes a backend Lambda function to process user prompts
Lambda function checks for and mitigates potential prompt injection attacks
AWS Config applies HIPAA-specific compliance rules

4. Auditing and Monitoring

All actions are logged to a secure S3 bucket via AWS CloudTrail
Audit logs track PII detection, masking, and prompt injection events

Business Impact and Use Cases

Enables secure and compliant use of sensitive data in AI/ML applications
Protects against data breaches and privacy violations
Allows organizations to leverage AI while maintaining data governance
Particularly relevant for industries like healthcare, finance, and government

Key Takeaways

Comprehensive data protection is crucial for responsible AI development
Layered security approach with encryption, access control, auditing, and compliance is essential
Differential privacy techniques like k-anonymity can effectively mask PII
Prompt injection is a significant risk for language model-based applications
The presented framework and code can be adapted for various AI use cases

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

AWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)