TransPPMP: predicting pathogenicity of frameshift and non-sense mutations by a Transformer based on protein features-Reference-Cited by-同舟云学术

TransPPMP: predicting pathogenicity of frameshift and non-sense mutations by a Transformer based on protein features

Published:2022-03-28 Issue:10 Volume:38 Page:2705-2711
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Nie Liangpeng¹^ORCID,Quan Lijun¹²³^ORCID,Wu Tingfang¹²³,He Ruji¹,Lyu Qiang¹²³

Affiliation:

1. School of Computer Science and Technology, Soochow University , Suzhou 215006, China

2. Province Key Lab for Information Processing Technologies, Soochow University , Suzhou 215006, China

3. Collaborative Innovation Center of Novel Software Technology and Industrialization , Nanjing 210000, China

Abstract

Abstract Motivation Protein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched. Results We built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations—not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short. Availability and implementation TransPPMP is available at https://github.com/lennylv/TransPPMP. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Jiangsu Province Youth Fund

A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions

Collaborative Innovation Center of Novel Software Technology and Industrialization

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac188/43340663/btac188.pdf

Reference43 articles.

1. A method and server for predicting damaging missense mutations;Adzhubei;Nat. Methods,2010

2. PSSM-based prediction of DNA binding sites in proteins;Ahmad;BMC Bioinformatics,2005

3. Cholesterol glucosylation is catalyzed by transglucosylation reaction of β-glucosidase 1;Akiyama;Biochem. Biophys. Res. Commun,2013

4. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs;Altschul;Nucleic Acids Res,1997

5. Accurate prediction of protein structures and interactions using a three-track neural network;Baek;Science,2021

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MetalTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in Protein Metal-Binding Sites;Journal of Chemical Information and Modeling;2024-08-02

2. FCMSTrans: Accurate Prediction of Disease-Associated nsSNPs by Utilizing Multiscale Convolution and Deep Feature Combination within a Transformer Framework;Journal of Chemical Information and Modeling;2024-02-13

3. TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion;Journal of Chemical Information and Modeling;2024-02-09

4. Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein–protein interactions;Plant Methods;2023-12-07

5. Prediction of SARS-CoV-2 Infection Phosphorylation Sites and Associations of these Modifications with Lung Cancer Development;Current Gene Therapy;2023-11-08