Enhancing missense variant pathogenicity prediction with protein language models using VariPred-Reference-Cited by-同舟云学术

Enhancing missense variant pathogenicity prediction with protein language models using VariPred

Published:2024-04-07 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Lin Weining^ORCID,Wells Jude^ORCID,Wang Zeyuan^ORCID,Orengo Christine^ORCID,Martin Andrew C. R.^ORCID

Abstract

AbstractComputational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. Using one of the best-performing protein language models (ESM-1b), we establish a robust classifier that requires no calculation of structural features or multiple sequence alignments. We compare the performance of VariPred with other representative models including 3Cnet, Polyphen-2, REVEL, MetaLR, FATHMM and ESM variant. VariPred performs as well as, or in most cases better than these other predictors using six variant impact prediction benchmarks despite requiring only sequence data and no pre-processing of the data.

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-51489-7.pdf

Reference36 articles.

1. Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001).

2. Ramensky, V. Human non-synonymous SNPs: Server and survey. Nucleic Acids Res. 30, 3894–3900 (2002).

3. Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: Application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).

4. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

5. Sundaram, L. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat. Genet. 50, 1161–1170 (2018).