A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides-Reference-Cited by-同舟云学术

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

Published:2021-11-13 Issue:22 Volume:22 Page:12291
ISSN:1422-0067
Container-title:International Journal of Molecular Sciences
language:en
Short-container-title:IJMS

Author:

Lee Byungjo^ORCID,Shin Min Kyoung^ORCID,Hwang In-Wook^ORCID,Jung Junghyun,Shim Yu Jeong,Kim Go Woon,Kim Seung Tae,Jang Wonhee,Sung Jung-Suk^ORCID

Abstract

As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and F1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of Callobius koreanus spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data.

Funder

National Institute of Biological Resources

Publisher

MDPI AG

Subject

Inorganic Chemistry,Organic Chemistry,Physical and Theoretical Chemistry,Computer Science Applications,Spectroscopy,Molecular Biology,General Medicine,Catalysis

Link

https://www.mdpi.com/1422-0067/22/22/12291/pdf

Reference50 articles.

1. Venom Composition and Strategies in Spiders

2. Mesothelae have venom glands

3. Neurotoxins: overview of an emerging research technology

4. Botulinum Neurotoxin a Blocks Synaptic Vesicle Exocytosis but Not Endocytosis at the Nerve Terminal

5. Neurotoxins and Their Binding Areas on Voltage-Gated Sodium Channels

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A personal view on the history of toxins: From ancient times to artificial intelligence;Toxicon;2024-09

2. MultiToxPred 1.0: a novel comprehensive tool for predicting 27 classes of protein toxins using an ensemble machine learning approach;BMC Bioinformatics;2024-04-12

3. Therapeutic potential of snake venom: Toxin distribution and opportunities in deep learning for novel drug discovery;Medicine in Drug Discovery;2024-02

4. COVID-19 infection analysis framework using novel boosted CNNs and radiological images;Scientific Reports;2023-12-09

5. Deep learning tools to accelerate antibiotic discovery;Expert Opinion on Drug Discovery;2023-10-04