Abstract
AbstractBackgroundRapid and thorough quality assessment of sequenced genomes in an ultra-high-throughput scale is crucial for successful large-scale genomic studies. Comprehensive quality assessment typically requires full genome alignment, which costs a significant amount of computational resources and turnaround time. Existing tools are either computational expensive due to full alignment or lacking essential quality metrics by skipping read alignment.FindingsWe developed a set of rapid and accurate methods to produce comprehensive quality metrics directly from raw sequence reads without full genome alignment. Our methods offer orders of magnitude faster turnaround time than existing full alignment-based methods while providing comprehensive and sophisticated quality metrics, including estimates of genetic ancestry and contamination.ConclusionsBy rapidly and comprehensively performing the quality assessment, our tool will help investigators detect potential issues in ultra-high-throughput sequence reads in real-time within a low computational cost, ensuring high-quality downstream analysis and preventing unexpected loss in time, money, and invaluable specimens.
Publisher
Cold Spring Harbor Laboratory
Reference17 articles.
1. Andrews S , Babraham Bioinformatics. FastQC: A quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. 2010.
2. PIQA: pipeline for Illumina G1 genome analyzer data quality assessment
3. HTQC: a fast quality control toolkit for Illumina sequencing data
4. Li B , Zhan X , Wing MK , Anderson P , Kang HM , Abecasis GR. QPLOT: A quality assessment tool for next generation sequencing data. BioMed Research International. 2013;2013.
5. Broad Institute. Picard:A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. http://broadinstitute.github.io/picard/. 2016.