Abstract
AbstractBackgroundCurrent single nucleotide variants (SNVs) pathogenicity prediction tools assess various properties of genetic variants and provide a likelihood of causing a disease. This information aids in variant prioritization – the process of narrowing down the list of potential pathogenic variants, and, therefore, facilitating diagnostics. Assessing the effectiveness of SNV pathogenicity tools using ClinVar data is a widely adopted practice. Our findings demonstrate that this conventional method tends to overstate performance estimates.MethodsWe introduce SNPred, an ensemble model specifically designed for predicting the pathogenicity of nonsynonymous single nucleotide variants (nsSNVs). To evaluate its performance, we conducted assessments using six distinct validation datasets derived from ClinVar andBRCA1Saturation Genome Editing (SGE) data.ResultsAcross all validation scenarios, SNPred consistently outperformed other state-of-the-art tools, particularly in the case of rare and cancer-related variants, as well as variants that are classified with low confidence by mostin silicotools. To ensure convenience, we provide precalculated scores for all possible nsSNVs.We proved that the exceptionally high accuracy scores of the best models achieved for ClinVar variants are only attainable if the models learn to replicate misclassifications found in ClinVar. Additionally, we conducted a comparison of predictor performance on two distinct sets of BRCA1 variants that did not overlap: one sourced from ClinVar and the other from the SGE study. Across allin silicopredictors, we observed a significant trend where ClinVar variants were classified with notably higher accuracy.ConclusionsWe provide a powerful variant pathogenicity predictor that enhances the quality of clinical variant interpretation and highlights important challenges of using ClinVar for SNV pathogenicity predictors evaluation.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献