iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization

Author:

Chen Zhen1,Zhao Pei2,Li Chen3,Li Fuyi345,Xiang Dongxu34,Chen Yong-Zi6,Akutsu Tatsuya7,Daly Roger J3,Webb Geoffrey I4,Zhao Quanzhi18,Kurgan Lukasz9ORCID,Song Jiangning34ORCID

Affiliation:

1. Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China

2. State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China

3. Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia

4. Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia

5. Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia

6. Laboratory of Tumor Cell Biology, Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300060, China

7. Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan

8. Key Laboratory of Rice Biology in Henan Province, Henan Agricultural University, Zhengzhou 450046, China

9. Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA

Abstract

Abstract Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.

Funder

National Health and Medical Research Council

National Natural Science Foundation of China

Australian Research Council

National Institutes of Health

Monash University

Kyoto University

Fundamental Research Funds for the Central Universities

National Natural Science Foundation of Liaoning Province

NHMRC

Robert J. Mattauch Endowment

Publisher

Oxford University Press (OUP)

Subject

Genetics

Reference147 articles.

1. PANNZER2: a rapid functional annotation web server;Toronen;Nucleic Acids Res.,2018

2. Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions;Chen;Brief. Bioinform.,2020

3. Machine learning techniques for protein function prediction;Bonetta;Proteins,2020

4. Recent progress in machine learning-based methods for protein fold recognition;Wei;Int. J. Mol. Sci.,2016

5. Advances in protein contact map prediction based on machine learning;Xie;Med. Chem.,2015

Cited by 120 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3