False Negatives Are a Significant Feature of Next Generation Sequencing Callsets-Reference-Cited by-同舟云学术

False Negatives Are a Significant Feature of Next Generation Sequencing Callsets

Published:2016-07-26 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Bobo Dean,Lipatov Mikhail,Rodriguez-Flores Juan L.,Auton Adam,Henn Brenna M.

Abstract

AbstractShort-read, next-generation sequencing (NGS) is now broadly used to identify rare or de novo mutations in population samples and disease cohorts. However, NGS data is known to be error-prone and post-processing pipelines have primarily focused on the removal of spurious mutations or “false positives” for downstream genome datasets. Less attention has been paid to characterizing the fraction of missing mutations or “false negatives” (FN). Here we interrogate several publically available human NGS autosomal variant datasets using corresponding Sanger sequencing as a truth-set. We examine both low-coverage Illumina and high-coverage Complete Genomics genomes. We show that the FN rate varies between 3%-18% and that false-positive rates are considerably lower (<3%) for publically available human genome callsets like 1000 Genomes. The FN rate is strongly dependent on calling pipeline parameters, as well as read coverage. Our results demonstrate that missing mutations are a significant feature of genomic datasets and imply additional fine-tuning of bioinformatics pipelines is needed. To address this, we design a phylogeny-aware tool [PhyloFaN] which can be used to quantify the FN rate for haploid genomic experiments, without additional generation of validation data. Using PhyloFaN on ultra-high coverage NGS data from both Illumina HiSeq and Complete Genomics platforms derived from the 1000 Genomes Project, we characterize the false negative rate in human mtDNA genomes. The false negative rate for the publically available mtDNA callsets is 17-20%, even for extremely high coverage haploid data.

Publisher

Cold Spring Harbor Laboratory

Reference46 articles.

1. Sequence and organization of the human mitochondrial genome

2. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA

3. A global reference for human genetic variation

4. A Fine-Scale Chimpanzee Genetic Map from Population Sequencing

5. Exome sequencing as a tool for Mendelian disease gene discovery

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Scalable inference of cell differentiation networks in gene therapy clonal tracking studies of haematopoiesis;Bioinformatics;2023-09-29

2. Stochastic modelling of cell differentiation networks from partially-observed clonal tracking data;2022-07-10

3. Advances and Trends in Omics Technology Development;Frontiers in Medicine;2022-07-01

4. Comparison of Next-Generation Sequencing and Polymerase Chain Reaction for Personalized Treatment-Related Genomic Status in Patients with Metastatic Colorectal Cancer;Current Issues in Molecular Biology;2022-04-05

5. Role of Helicobacter pylori and Other Environmental Factors in the Development of Gastric Dysbiosis;Pathogens;2021-09-16