TalksAWS re:Invent 2025 - Accelerate & automate secure data transfers at scale with AWS DataSync (STG340)

AWS re:Invent 2025 - Accelerate & automate secure data transfers at scale with AWS DataSync (STG340)

Accelerating Secure Data Transfers at Scale with AWS DataSync

Overview

  • Enterprises are creating exabytes of data every day, distributed across on-premises, edge, and multi-cloud environments
  • This creates challenges around data migration, governance, security, and performance at scale
  • AWS DataSync is a fully managed data transfer service designed to address these challenges

Key Use Cases for DataSync

  1. Migrations: Quickly and easily migrate file and object data from on-premises or other clouds to AWS
  2. Replication: Create secondary copies of data for disaster recovery
  3. Archive: Move cold, infrequently accessed data to cost-effective AWS storage like S3 Glacier
  4. Accelerated Workflows: Enable high-speed data transfers to support business-critical workloads

DataSync Capabilities

  • Supports data movement from on-premises storage, other clouds, and between AWS services
  • Preserves file metadata like permissions, timestamps, and ACLs during transfers
  • Provides advanced features like flexible scheduling, bandwidth control, and detailed reporting
  • Fully managed service that handles the underlying infrastructure and network optimization

DataSync Enhanced Mode

  • Enables virtually unlimited file transfers for S3 and cross-cloud scenarios
  • Increases transfer speeds for large files by breaking them into parallel chunks
  • Simplifies cross-cloud transfers by eliminating the need for agents in the other cloud

Path AI's Use Case

  • Path AI digitizes pathology workflows by converting glass slides to digital images
  • This generates massive amounts of data (gigabytes per slide) that needs to be securely transferred to the cloud
  • Path AI used DataSync to build a seamless data pipeline, allowing labs to push data to S3 without complex IT setups
  • This enabled Path AI to onboard labs in the US, Europe, and South America, moving peta-bytes of data to power their AI-based pathology platform

Optimizing DataSync Performance

  1. DataSync Agent Deployment: Agents can run as EC2 instances or on-premises VMs - on-premises is recommended for low-latency access to storage
  2. Testing and Validation: Run small-scale tests to validate connectivity, performance, and error recovery before full migrations
  3. Scaling with Multiple Tasks: Partition data sets and run multiple parallel DataSync tasks to maximize available network bandwidth

Key Takeaways

  • DataSync provides a fully managed, secure, and scalable solution for large-scale data migrations and transfers
  • Enhanced mode enables increased performance and simplified cross-cloud workflows
  • Path AI used DataSync to build a robust data pipeline, enabling digital pathology workflows powered by cloud-based AI
  • DataSync can be optimized through agent placement, testing, and parallel task execution to achieve high-speed data transfers

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.