TalksAWS re:Invent 2025 - Illumina DRAGEN pipelines on F2 instances with Nextflow & AWS Batch (CMP353)

AWS re:Invent 2025 - Illumina DRAGEN pipelines on F2 instances with Nextflow & AWS Batch (CMP353)

AWS re:Invent 2025 - Illumina DRAGEN pipelines on F2 instances with Nextflow & AWS Batch

Overview

  • Collaboration between AWS and Astroenica to run genomics pipelines on the latest generation of AWS F2 instances
  • Focused on performance, cost, and result reproducibility for Astroenica's high-throughput genomics workloads

Why Run Genomics Pipelines on AWS?

  • Astroenica has standardized genomics workflows and pipelines across different modalities (e.g. exome, whole genome)
  • Need to demonstrate result reproducibility when upgrading or migrating these workflows
  • Massive data volumes (petabytes, millions of files) require cost-effective storage and retrieval
  • Workloads are highly spiky, requiring rapid processing of large sample batches

Benchmarking F1 vs. F2 Instances

  • Ran identical Illumina DRAGEN pipelines on F1 (prior generation) and F2 (latest generation) instances
  • Observed up to 62% performance improvement on F2 instances
  • Saw 71% reduction in compute costs on F2 instances
  • Confirmed complete result equivalence between F1 and F2 runs using bioinformatics tools

Technical Architecture

  • Utilized AWS Batch for containerized job orchestration and dynamic provisioning
  • Leveraged Nextflow for workflow automation and Sakara Nextflow for web-based job monitoring
  • Configured separate Batch compute environments for exome (DRAGEN 4.36) and genome (DRAGEN 3.78) pipelines
  • Staged input data in S3, with local NVMe storage on F2 instances for performance

Key Learnings

  1. Equivalence testing is crucial to build trust with end users when migrating genomics workflows
  2. Keeping data and compute in the same AWS region is important for regulatory, data residency, and cost reasons
  3. Local NVMe storage on F2 instances can provide similar performance to EBS RAID configurations recommended in Illumina documentation

Business Impact

  • Significant 62% performance improvement and 71% cost reduction enable Astroenica to process more samples in less time
  • Cost savings can be reinvested into other areas of the business to drive scientific innovation
  • Ability to rapidly scale up and down compute resources as needed for spiky genomics workloads

Example Use Case

  • Astroenica runs tens of thousands of genomics samples per month, with large batches arriving at irregular intervals
  • By leveraging the performance and cost-efficiency of F2 instances with AWS Batch and Nextflow, Astroenica can process these large batches much more quickly and cost-effectively

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.