Positive-unlabeled learning in bioinformatics and computational biology: a brief review

Author:

Li Fuyi1,Dong Shuangyu1,Leier André2,Han Meiya3,Guo Xudong4,Xu Jing5,Wang Xiaoyu6,Pan Shirui7,Jia Cangzhi8,Zhang Yang9,Webb Geoffrey I10,Coin Lachlan J M11,Li Chen12,Song Jiangning13ORCID

Affiliation:

1. Monash University, Australia

2. Department of Genetics, UAB School of Medicine, USA

3. Department of Biochemistry and Molecular Biology, Monash University, Australia

4. Ningxia University, China

5. Computer Science and Technology from Nankai University, China

6. Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia

7. University of Technology Sydney (UTS), Ultimo, NSW, Australia

8. College of Science, Dalian Maritime University, Australia

9. Northwestern Polytechnical University, China

10. Faculty of Information Technology at Monash University, Australia

11. Department of Clinical Pathology, University of Melbourne, Australia

12. Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia

13. Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia

Abstract

Abstract Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.

Funder

NHMRC

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3