Quality Assessment of High-throughput DNA Sequencing Data via Range analysis-Reference-Cited by-同舟云学术

Quality Assessment of High-throughput DNA Sequencing Data via Range analysis

Published:2017-01-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Külekci M. Oğuzhan,Fotouhi Ali,Majidi Mina

Abstract

AbstractIn the recent literature there appeared a number of studies for the quality assessment of sequencing data. These efforts, to a great extent, focused on reporting the statistical parameters regarding to the distribution of the quality scores and/or the base-calls in a FASTQ file. We investigate another dimension for the quality assessment motivated with the fact that reads including long intervals having fewer errors improve the performances of the post-processing tools in the down-stream analysis. Thus, the quality assessment procedures proposed in this study aim to analyze the segments on the reads that are above a certain quality. We define an interval of a read to be of desired quality when there are at most k quality scores less than or equal to a threshold value v, for some v and k provided by the user. We present the algorithm to detect those ranges and introduce new metrics computed from their lengths. These metrics include the mean values for the longest, shortest, average, cubic average, and average variation coefficient of the fragment lengths that are appropriate according to the v and k input parameters. We provide a new software tool QASDRA for quality assessment of sequencing data via range analysis. QASDRA, implemented in Python, and publicly available at

https://github.com/ali-cp/QASDRA.git

, creates the quality assessment report of an input FASTQ file according to the user specified k and v parameters. It also has the capabilities to filter out the reads according to the metrics introduced.

Publisher

Cold Spring Harbor Laboratory

Reference13 articles.

1. Simon Andrews . A quality control tool for high throughput sequence data, 2010.

2. Geraldine A Auwera , Mauricio O Carneiro , Christopher Hartl , Ryan Poplin , Guillermo del Angel , Ami Levy-Moonshine , Tadeusz Jordan , Khalid Shakir , David Roazen , Joel Thibault , et al. From fastq data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Current protocols in bioinformatics, pages 11–10, 2013.

3. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data

4. Travis Gagie , Simon J Puglisi , and Andrew Turpin . Range quantile queries: Another virtue of wavelet trees. In International Symposium on String Processing and Information Retrieval, pages 1–6. Springer, 2009.

5. M Oğuzhan Külekci . Inverse range selection queries. In International Symposium on String Processing and Information Retrieval, pages 166–177. Springer, 2016.