UniDL4BioPep: a universal deep learning architecture for binary classification in peptide bioactivity

Author:

Du Zhenjiao1,Ding Xingjian2,Xu Yixiang3,Li Yonghui1

Affiliation:

1. Department of Grain Science and Industry, Kansas State University , Manhattan, KS 66506, USA

2. Department of Computer Science, Kansas State University , Manhattan, KS 66506, USA

3. Healthy Processed Foods Research Unit, Western Regional Research Center USDA-ARS , 800 Buchanan Street, Albany, CA 94710, USA

Abstract

Abstract Identification of potent peptides through model prediction can reduce benchwork in wet experiments. However, the conventional process of model buildings can be complex and time consuming due to challenges such as peptide representation, feature selection, model selection and hyperparameter tuning. Recently, advanced pretrained deep learning-based language models (LMs) have been released for protein sequence embedding and applied to structure and function prediction. Based on these developments, we have developed UniDL4BioPep, a universal deep-learning model architecture for transfer learning in bioactive peptide binary classification modeling. It can directly assist users in training a high-performance deep-learning model with a fixed architecture and achieve cutting-edge performance to meet the demands in efficiently novel bioactive peptide discovery. To the best of our best knowledge, this is the first time that a pretrained biological language model is utilized for peptide embeddings and successfully predicts peptide bioactivities through large-scale evaluations of those peptide embeddings. The model was also validated through uniform manifold approximation and projection analysis. By combining the LM with a convolutional neural network, UniDL4BioPep achieved greater performances than the respective state-of-the-art models for 15 out of 20 different bioactivity dataset prediction tasks. The accuracy, Mathews correlation coefficient and area under the curve were 0.7–7, 1.23–26.7 and 0.3–25.6% higher, respectively. A user-friendly web server of UniDL4BioPep for the tested bioactivities is established and freely accessible at https://nepc2pvmzy.us-east-1.awsapprunner.com. The source codes, datasets and templates of UniDL4BioPep for other bioactivity fitting and prediction tasks are available at https://github.com/dzjxzyd/UniDL4BioPep.

Funder

Kansas Agricultural Experimental Station

Agriculture and Food Research Initiative Competitive

National Institute of Food and Agriculture

Global Food Systems initiative of Kansas State University

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference56 articles.

1. Novel technologies for the production of bioactive peptides;Ulug;Trends Food Sci Technol,2021

2. Review and perspective on bioactive peptides: a roadmap for research, development, and future opportunities;Du;J Agric Food Res,2022

3. Application of in silico approaches for the generation of milk protein-derived bioactive peptides;FitzGerald;J Funct Foods,2020

4. Elucidation of the role of in silico methodologies in approaches to studying bioactive peptides derived from foods;Iwaniak;J Funct Foods,2019

5. Bioinformatics approaches to discovering food-derived bioactive peptides: reviews and perspectives;Du,2023

Cited by 34 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3