Author:
Albrecht Steffen,Sprang Maximilian,Andrade-Navarro Miguel A.,Fontaine Jean-Fred
Abstract
AbstractControlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer.
Funder
Johannes Gutenberg-Universität Mainz
International PhD Programme, Mainz
Publisher
Springer Science and Business Media LLC
Reference44 articles.
1. Merino GA, Fresno C, Netto F, Netto ED, Pratto L, Fernandez EA. The impact of quality control in RNA-seq experiments. J Phys Conf Ser. 2016;705:012003. https://doi.org/10.1088/1742-6596/705/1/012003.
2. Williams CR, Baccarella A, Parrish JZ, Kim CC. Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC Bioinformatics. 2016;17:103.
3. Yang S-F, Lu C-W, Yao C-T, Hung C-M. To trim or not to trim: effects of read trimming on the de novo genome assembly of a widespread east Asian passerine, the Rufous-capped babbler (Cyanoderma ruficeps Blyth). Genes. 2019;10:737.
4. Meyer CA, Liu XS. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet. 2014;15:709–21.
5. Andrews S, others: FastQC: a quality control tool for high throughput sequence data [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/] Accessed 20 Nov 2020.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献