GMHCC: high-throughput analysis of biomolecular data using graph-based multiple hierarchical consensus clustering

Author:

Lu Yifu1,Yu Zhuohan1,Wang Yunhe1,Ma Zhiqiang1,Wong Ka-Chun2ORCID,Li Xiangtao1ORCID

Affiliation:

1. School of Artificial Intelligence, Jilin University , Changchun 130012, China

2. Department of Computer Science, City University of Hong Kong , Hong Kong 999077, Hong Kong SAR

Abstract

Abstract Motivation Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the main challenges in analyzing this biomolecular data is to cluster their subtypes into subpopulations to facilitate subsequent downstream analysis. Recently, many clustering methods have been developed to address the biomolecular data. However, the computational methods often suffer from many limitations such as high dimensionality, data heterogeneity and noise. Results In our study, we develop a novel Graph-based Multiple Hierarchical Consensus Clustering (GMHCC) method with an unsupervised graph-based feature ranking (FR) and a graph-based linking method to explore the multiple hierarchical information of the underlying partitions of the consensus clustering for multiple types of biomolecular data. Indeed, we first propose to use a graph-based unsupervised FR model to measure each feature by building a graph over pairwise features and then providing each feature with a rank. Subsequently, to maintain the diversity and robustness of basic partitions (BPs), we propose multiple diverse feature subsets to generate several BPs and then explore the hierarchical structures of the multiple BPs by refining the global consensus function. Finally, we develop a new graph-based linking method, which explicitly considers the relationships between clusters to generate the final partition. Experiments on multiple types of biomolecular data including 35 cancer gene expression datasets and eight single-cell RNA-seq datasets validate the effectiveness of our method over several state-of-the-art consensus clustering approaches. Furthermore, differential gene analysis, gene ontology enrichment analysis and KEGG pathway analysis are conducted, providing novel insights into cell developmental lineages and characterization mechanisms. Availability and implementation The source code is available at GitHub: https://github.com/yifuLu/GMHCC. The software and the supporting data can be downloaded from: https://figshare.com/articles/software/GMHCC/17111291. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Fundamental Research Funds for the Central Universities

Research Grants Council of the Hong Kong Special Administrative Region [CityU

Health and Medical Research Fund, of the Food and Health Bureau

The Government of the Hong Kong Special Administrative Region

Hong Kong Institute for Data Science (HKIDS) at the City University of Hong Kong

City University of Hong Kong

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference34 articles.

1. VPAC: variational projection for accurate clustering of single-cell transcriptomic data;Chen;BMC Bioinformatics,2019

2. Random projection for high dimensional data clustering: a cluster ensemble approach;Fern,2003

3. Combining multiple clusterings using evidence accumulation;Fred;IEEE Trans. Pattern Anal. Mach. Intell,2005

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3