WEAP: An automatic and accelerated pipeline for analysing multi-sample whole exome sequencing data
Author:
Sarma Ranjan Jyoti1ORCID, Nachimuthu 1ORCID
Abstract
Abstract
Background
Whole Exome Sequencing (WES) is commonly used for SNP discovery in the coding regions of the human genome and has a wide range of clinical applications. Being an intensive time-consuming task, automation is key to uncomplicating and performing straightforward data analysis.
Method
The WEAP workflow starts with the alignment of FASTQ files to a reference genome, variant calling, and annotation without user intervention. WEAP utilizes the GATK workflow incorporating popular NGS analysis tools such as bwa-mem2, samtools, GATK, bcftools, and anoovar coupled with GNU parallel.
Results
WEAP successfully identified and annotated germline and somatic variants. The major steps aligning to the reference genome, converting files, and removing duplicates in germline variant discovery were made several folds (1.5 to 3.6 folds) faster in parallel mode than in serial mode. In tumor analysis, creating a PoN from 40 samples was about 3 times faster in parallel mode. Tumor-only analysis was 1.4 to 7.7 times faster in different steps. When comparing tumor samples with matched normal tissues, the time taken was significantly reduced, making the process 1.8 to 3.6 times faster.
Conclusions
WEAP accepts Quality Control (QC) checked and trimmed FASTQ reads, and provides annotated variants that enable non-bioinformaticians to perform flawless variant calling from WES data. WEAP uses GNU parallel for multiple sample processing one at a time leveraging native parallel processing of the implemented tools and software to perform the analysis faster. A comparison between the parallel mode and serial mode of WEAP revealed that WEAP can be one of the best alternative tools for end-to-end analysis of WES data integrating gold standard GATK best practices workflow.
Funder
Department of Biotechnology, Ministry of Science and Technology, India
Publisher
Research Square Platform LLC
Reference40 articles.
1. Exome sequencing explained: a practical guide to its clinical application;Seaby EG;Brief Funct Genomics,2016 2. Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, Benner C, Liu D, Locke AE, Balasubramanian S, Yadav A, Banerjee N, Gillies CE, Damask A, Liu S, Bai X, Hawes A, Maxwell E, Gurski L, Watanabe K, Kosmicki JA, Rajagopal V, Mighty J, Regeneron G, DiscovEHR C, Jones M, Mitnaul L, Stahl E, Coppola G, Jorgenson E, Habegger L, Salerno WJ, Shuldiner AR, Lotta LA, Overton JD, Cantor MN, Reid JG, Yancopoulos G, Kang HM, Marchini J, Baras A, Abecasis GR, Ferreira MAR (2021) Exome sequencing and analysis of 454,787 UK Biobank participants, Nature. 599(7886) (2021) 628–634. https://doi.org/10.1038/s41586-021-04103-z 3. Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools;Alganmi N;PLoS ONE,2023 4. Application of Whole Exome Sequencing to Identify Disease-Causing Variants in Inherited Human Diseases;Goh G;Genomics Inf,2012 5. Clinical application of whole-exome sequencing across clinical indications;Retterer K;Genet Med,2016
|
|