ESMSec: Prediction of Secreted Proteins in Human Body Fluids Using Protein Language Models and Attention

Author:

Wang Yan1ORCID,Sun Huiting1,Sheng Nan1,He Kai2,Hou Wenjv1,Zhao Ziqi1,Yang Qixing1,Huang Lan1ORCID

Affiliation:

1. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China

2. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48103, USA

Abstract

The secreted proteins of human body fluid have the potential to be used as biomarkers for diseases. These biomarkers can be used for early diagnosis and risk prediction of diseases, so the study of secreted proteins of human body fluid has great application value. In recent years, the deep-learning-based transformer language model has transferred from the field of natural language processing (NLP) to the field of proteomics, leading to the development of protein language models (PLMs) for protein sequence representation. Here, we propose a deep learning framework called ESM Predict Secreted Proteins (ESMSec) to predict three types of proteins secreted in human body fluid. The ESMSec is based on the ESM2 model and attention architecture. Specifically, the protein sequence data are firstly put into the ESM2 model to extract the feature information from the last hidden layer, and all the input proteins are encoded into a fixed 1000 × 480 matrix. Secondly, multi-head attention with a fully connected neural network is employed as the classifier to perform binary classification according to whether they are secreted into each body fluid. Our experiment utilized three human body fluids that are important and ubiquitous markers. Experimental results show that ESMSec achieved average accuracy of 0.8486, 0.8358, and 0.8325 on the testing datasets for plasma, cerebrospinal fluid (CSF), and seminal fluid, which on average outperform the state-of-the-art (SOTA) methods. The outstanding performance results of ESMSec demonstrate that the ESM can improve the prediction performance of the model and has great potential to screen the secretion information of human body fluid proteins.

Funder

National Natural Science Foundation of China

Development Project of Jilin Province of China

Jilin Provincial Key Laboratory of Big Data Intelligent Cognition

Publisher

MDPI AG

Reference24 articles.

1. Biomarkers—A general review;Aronson;Curr. Protoc. Pharmacol.,2017

2. Human body fluid proteome analysis;Hu;Proteomics,2006

3. Human body-fluid proteome: Quantitative profiling and computational prediction;Huang;Brief. Bioinform.,2021

4. Therapeutic potential of the plasma proteome;Lathrop;Curr. Opin. Mol. Ther.,2003

5. Tiselius, A. (1937). Electrophoresis of serum globulin: Electrophoretic analysis of normal and immune sera. Biochem. J., 31.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3