An augmented transformer model trained on family specific variant data leads to improved prediction of variants of uncertain significance

Author:

Joshi Dinesh1,Pradhan Swatantra1,Sajeed Rakshanda1,Sriniva Rajgopal1,Rana Sadhna1

Affiliation:

1. Tata Consultancy Services

Abstract

Abstract Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease thus posing challenge in interpretation of genetic testing results. In this work, we present an improved gene specific approach to variant prediction that leverages a pre-trained protein language model for predicting VUS. Our deep learning model combines zero-shot log odd scores from evolutionary scale model (ESM-2) as a feature along with embeddings from ESM-2 as features for training a supervised model on variants associated with the gene. Our training set creation approach uses variant data from a gene family if the gene of interest has low or no functional data for training a gene specific predictor. We demonstrate the accuracy of our method by testing it on VUS of an enzyme NAGLU (Alpha-N-acetylglucosaminidase) whose deficiency due to mutations is known to cause a rare genetic disorder, Mucopolysaccharidosis IIIB or Sanfillipo B disease. Our model augmented with contextual information from the gene family improves prediction of VUS in the NAGLU gene and outperforms state-of-the-art pathogenicity predictors. Our results also indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions.

Publisher

Research Square Platform LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3