TalksAWS re:Invent 2025 - Illumina DRAGEN pipelines on F2 instances with Nextflow & AWS Batch (CMP353)
AWS re:Invent 2025 - Illumina DRAGEN pipelines on F2 instances with Nextflow & AWS Batch (CMP353)
AWS re:Invent 2025 - Illumina DRAGEN pipelines on F2 instances with Nextflow & AWS Batch
Overview
Collaboration between AWS and Astroenica to run genomics pipelines on the latest generation of AWS F2 instances
Focused on performance, cost, and result reproducibility for Astroenica's high-throughput genomics workloads
Why Run Genomics Pipelines on AWS?
Astroenica has standardized genomics workflows and pipelines across different modalities (e.g. exome, whole genome)
Need to demonstrate result reproducibility when upgrading or migrating these workflows
Massive data volumes (petabytes, millions of files) require cost-effective storage and retrieval
Workloads are highly spiky, requiring rapid processing of large sample batches
Benchmarking F1 vs. F2 Instances
Ran identical Illumina DRAGEN pipelines on F1 (prior generation) and F2 (latest generation) instances
Observed up to 62% performance improvement on F2 instances
Saw 71% reduction in compute costs on F2 instances
Confirmed complete result equivalence between F1 and F2 runs using bioinformatics tools
Technical Architecture
Utilized AWS Batch for containerized job orchestration and dynamic provisioning
Leveraged Nextflow for workflow automation and Sakara Nextflow for web-based job monitoring
Configured separate Batch compute environments for exome (DRAGEN 4.36) and genome (DRAGEN 3.78) pipelines
Staged input data in S3, with local NVMe storage on F2 instances for performance
Key Learnings
Equivalence testing is crucial to build trust with end users when migrating genomics workflows
Keeping data and compute in the same AWS region is important for regulatory, data residency, and cost reasons
Local NVMe storage on F2 instances can provide similar performance to EBS RAID configurations recommended in Illumina documentation
Business Impact
Significant 62% performance improvement and 71% cost reduction enable Astroenica to process more samples in less time
Cost savings can be reinvested into other areas of the business to drive scientific innovation
Ability to rapidly scale up and down compute resources as needed for spiky genomics workloads
Example Use Case
Astroenica runs tens of thousands of genomics samples per month, with large batches arriving at irregular intervals
By leveraging the performance and cost-efficiency of F2 instances with AWS Batch and Nextflow, Astroenica can process these large batches much more quickly and cost-effectively
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.