VarPPUD: Variant post prioritization developed for undiagnosed genetic disorders

Author:

Yin RuiORCID,Gutierrez AlbaORCID, ,Kobren Shilpa NadimpalliORCID,Avillach PaulORCID

Abstract

AbstractRare and ultra-rare genetic conditions are estimated to impact nearly 1 in 17 people worldwide, yet accurately pinpointing the diagnostic variants underlying each of these conditions remains a formidable challenge. Because comprehensive,in vivofunctional assessment of all possible genetic variants is infeasible, clinicians instead considerin silicovariant pathogenicity predictions to distinguish plausibly disease-causing from benign variants across the genome. However, in the most difficult undiagnosed cases, such as those accepted to the Undiagnosed Diseases Network (UDN), existing pathogenicity predictions cannot reliably discern true etiological variant(s) from other deleterious candidate variants that were prioritized through N-of-1 efforts. Pinpointing the disease-causing variant from a pool of plausible candidates remains a largely manual effort requiring extensive clinical workups, functional and experimental assays, and eventual identification of genotype- and phenotype-matched individuals. Here, we introduce VarPPUD, a tool trained on prioritized variants from UDN cases, that leverages gene-, amino acid-, and nucleotide-level features to discern pathogenic variants from other deleterious variants that are unlikely to be confirmed as disease relevant. VarPPUD achieves a cross-validated accuracy of 79.3% and precision of 77.5% on a held-out subset of uniquely challenging UDN cases, respectively representing an average 18.6% and 23.4% improvement over nine traditional pathogenicity prediction approaches on this task. We validate VarPPUD’s ability to discriminate likely from unlikely pathogenic variants on synthetic, GAN-generated candidate variants as well. Finally, we show how VarPPUD can be probed to evaluate each input feature’s importance and contribution toward prediction—an essential step toward understanding the distinct characteristics of newly-uncovered disease-causing variants.Significance StatementPatients with chronic, undiagnosed and underdiagnosed genetic conditions often endure expensive and excruciating years-long diagnostic odysseys without clear results. In many instances, clinical genome sequencing of patients and their family members fails to reveal known disease-causing variants, although compelling variants of uncertain significance are frequently encountered. Existing computational tools struggle to reliably differentiate truly disease-causing variants from other plausible candidate variants within these prioritized sets. Consequently, the confirmation of disease-causing variants often necessitates extensive experimental follow-up, including studies in model organisms and identification of other similarly presenting genotype-matched individuals, a process that can extend for several years. Here, we present VarPPUD, a tool trained specifically to distinguish likely from unlikely to be confirmed pathogenic variants that were prioritized across cases in the Undiagnosed Diseases Network. By evaluating the importance and impact of different input feature values on prediction, we gain deeper insights into the distinctive attributes of difficult-to-identify diagnostic variants. For patients who remain undiagnosed following comprehensive whole genome sequencing, our new method VarPPUD may reveal pathogenic variants amid a pool of candidate variants, thereby advancing diagnostic efforts where progress has otherwise stalled.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3