MiPepid: MicroPeptide identification tool using machine learning-Reference-Cited by-同舟云学术

MiPepid: MicroPeptide identification tool using machine learning

Published:2019-11-08 Issue:1 Volume:20 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Zhu Mengmeng,Gribskov Michael^ORCID

Abstract

Abstract Background Micropeptides are small proteins with length < = 100 amino acids. Short open reading frames that could produces micropeptides were traditionally ignored due to technical difficulties, as few small peptides had been experimentally confirmed. In the past decade, a growing number of micropeptides have been shown to play significant roles in vital biological activities. Despite the increased amount of data, we still lack bioinformatics tools for specifically identifying micropeptides from DNA sequences. Indeed, most existing tools for classifying coding and noncoding ORFs were built on datasets in which “normal-sized” proteins were considered to be positives and short ORFs were generally considered to be noncoding. Since the functional and biophysical constraints on small peptides are likely to be different from those on “normal” proteins, methods for predicting short translated ORFs must be trained independently from those for longer proteins. Results In this study, we have developed MiPepid, a machine-learning tool specifically for the identification of micropeptides. We trained MiPepid using carefully cleaned data from existing databases and used logistic regression with 4-mer features. With only the sequence information of an ORF, MiPepid is able to predict whether it encodes a micropeptide with 96% accuracy on a blind dataset of high-confidence micropeptides, and to correctly classify newly discovered micropeptides not included in either the training or the blind test data. Compared with state-of-the-art coding potential prediction methods, MiPepid performs exceptionally well, as other methods incorrectly classify most bona fide micropeptides as noncoding. MiPepid is alignment-free and runs sufficiently fast for genome-scale analyses. It is easy to use and is available at https://github.com/MindAI/MiPepid. Conclusions MiPepid was developed to specifically predict micropeptides, a category of proteins with increasing significance, from DNA sequences. It shows evident advantages over existing coding potential prediction methods on micropeptide identification. It is ready to use and runs fast.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

http://link.springer.com/content/pdf/10.1186/s12859-019-3033-9.pdf

Reference48 articles.

1. Makarewich CA, Olson EN. Mining for Micropeptides. Trends Cell Biol. 2017;27:685–96. https://doi.org/10.1016/j.tcb.2017.04.006 .

2. Chugunova A, Navalayeu T, Dontsova O, Sergiev P. Mining for Small Translated ORFs. J Proteome Res. 2018;17:1–11. https://doi.org/10.1021/acs.jproteome.7b00707 .

3. Couso J-P, Patraquim P. Classification and function of small open reading frames. Nat Rev Mol Cell Biol. 2017;18:575. https://doi.org/10.1038/nrm.2017.58 .

4. Olexiouk V, Van Criekinge W, Menschaert G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2018;46:D497–502.

5. Olexiouk V, Crappé J, Verbruggen S, Verhegen K, Martens L, Menschaert G. sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2016;44:D324–9. https://doi.org/10.1093/nar/gkv1175 .

Cited by 64 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The role of polypeptides encoded by ncRNAs in cancer;Gene;2024-11

2. Multi-Omic Approaches in Cancer-Related Micropeptide Identification;Proteomes;2024-09-13

3. Microscopic marvels: Decoding the role of micropeptides in innate immunity;Immunology;2024-08-26

4. LncRNA-encoded peptides in cancer;Journal of Hematology & Oncology;2024-08-12

5. Exploring the Dark Matter of Human Proteome: The Emerging Role of Non-Canonical Open Reading Frame (ncORF) in Cancer Diagnosis, Biology, and Therapy;Cancers;2024-07-26