Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets-Reference-Cited by-同舟云学术

Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets

Published:2019-06-24 Issue:7 Volume:29 Page:1144-1151
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Evans Perry,Wu Chao,Lindy Amanda,McKnight Dianalee A.,Lebo Matthew,Sarmady Mahdi,Abou Tayoun Ahmad N.^ORCID

Abstract

Recent advances in DNA sequencing have expanded our understanding of the molecular basis of genetic disorders and increased the utilization of clinical genomic tests. Given the paucity of evidence to accurately classify each variant and the difficulty of experimentally evaluating its clinical significance, a large number of variants generated by clinical tests are reported as variants of unknown clinical significance. Population-scale variant databases can improve clinical interpretation. Specifically, pathogenicity prediction for novel missense variants can use features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant data set. Here, we introduce one variant data set derived from clinical sequencing panels and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This data set is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further use this data set to demonstrate the necessity of disease-specific classifiers and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant-level features. PathoPredictor achieves an average precision >90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. The accumulation of larger clinical variant training data sets can significantly enhance their performance in a disease- and gene-specific manner.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference29 articles.

1. Using large sequencing data sets to refine intragenic disease regions and prioritize clinical variant interpretation

2. Genome-wide expression profiling of maize in response to individual and combined water and nitrogen stresses

3. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff

4. HGVS Recommendations for the Description of Sequence Variants: 2016 Update

5. LOVD v.2.0: the next generation in gene variant databases

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Toward a universal approach for predicting variant pathogenicity in diverse disease landscapes;Journal of Genetics and Genomics;2024-07

2. Mutation types and pathogenicity classification using multi-label multi-class deep networks;AIP Conference Proceedings;2024

3. Structural mapping of patient-associated KCNMA1 gene variants;Biophysical Journal;2023-12

4. MmisAT and MmisP: an efficient and accurate suite of variant analysis toolkit for primary mitochondrial diseases;Human Genomics;2023-11-27

5. Structural mapping of patient-associatedKCNMA1gene variants;2023-07-28