ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity

Author:

Pan Xiaoyong123ORCID,Zuallaert Jasper24ORCID,Wang Xi3,Shen Hong-Bin1ORCID,Campos Elda Posada3,Marushchak Denys O3ORCID,De Neve Wesley24

Affiliation:

1. Department of Automation, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China

2. Department for Electronics and Information Systems, IDLab, Ghent University, Ghent 9000, Belgium

3. BASF Belgium Coordination Center – Innovation Center Gent, Ghent 9000, Belgium

4. Department of Environmental Technology, Food Technology and Molecular Biotechnology, Center for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon 305-701, South Korea

Abstract

Abstract Motivation Genetically engineering food crops involves introducing proteins from other species into crop plant species or modifying already existing proteins with gene editing techniques. In addition, newly synthesized proteins can be used as therapeutic protein drugs against diseases. For both research and safety regulation purposes, being able to assess the potential toxicity of newly introduced/synthesized proteins is of high importance. Results In this study, we present ToxDL, a deep learning-based approach for in silico prediction of protein toxicity from sequence alone. ToxDL consists of (i) a module encompassing a convolutional neural network that has been designed to handle variable-length input sequences, (ii) a domain2vec module for generating protein domain embeddings and (iii) an output module that classifies proteins as toxic or non-toxic, using the outputs of the two aforementioned modules. Independent test results obtained for animal proteins and cross-species transferability results obtained for bacteria proteins indicate that ToxDL outperforms traditional homology-based approaches and state-of-the-art machine-learning techniques. Furthermore, through visualizations based on saliency maps, we are able to verify that the proposed network learns known toxic motifs. Moreover, the saliency maps allow for directed in silico modification of a sequence, thus making it possible to alter its predicted protein toxicity. Availability and implementation ToxDL is freely available at http://www.csbio.sjtu.edu.cn/bioinf/ToxDL/. The source code can be found at https://github.com/xypan1232/ToxDL. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Natural Science Foundation of China

Science and Technology Commission of Shanghai Municipality

BASF

Ghent University

Ghent University Global Campus

Flanders Innovation & Entrepreneurship

Fund for Scientific Research-Flanders

European Union

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference31 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3