Author:
Liu Fengrong,Yang Yaning,Xu Xu Steven,Yuan Min
Abstract
AbstractMany soft biclustering algorithms have been developed and applied to various biological and biomedical data analyses. However, until now, few mutually exclusive (hard) biclustering algorithms have been proposed although they can be extremely useful for identify disease or molecular subtypes based on genomic or transcriptomic data. We considered the biclustering problem of expression matrices as a bipartite graph partitioning problem and developed a novel biclustering algorithm, MESBC, based on Dhillon’s spectral method to detect mutually exclusive biclusters. MESBC simultaneously detects relevant features (genes) and corresponding subgroups, and therefore automatically uses the signature features for each subtype to perform the clustering, improving the clustering performance. MESBC could accurately detect the pre-specified biclusters in simulations, and the identified biclusters were highly consistent with the true labels. Particularly, in setting with high noise, MESBC outperformed existing NMF and Dhillon’s method and provided markedly better accuracy. Analysis of two TCGA datasets (LUAD and BRAC cohorts) revealed that MESBC provided similar or more accurate prognostication (i.e., smaller p value) for overall survival in patients with breast and lung cancer, respectively, compared to the existing, gold-standard subtypes for breast (PAM50) and lung cancer (integrative clustering). In the TCGA lung cancer patients, MESBC detected two clinically relevant, rare subtypes that other biclustering or integrative clustering algorithms could not detect. These findings validated our hypothesis that MESBC could improve molecular subtyping in cancer patients and potentially facilitate better individual patient management, risk stratification, patient selection, therapeutic assignments, as well as better understanding gene signatures and molecular pathways for development of novel therapeutic agents.
Publisher
Cold Spring Harbor Laboratory
Reference66 articles.
1. Mirkin B. Mathematical Classification and Clustering, Kluwer Academic Publishers 1996.
2. Hofmann T , Puzicha J. Latent class models for collaborative filtering. Proceedings of the 16th international joint conference on Artificial intelligence. Stockholm, Sweden: Morgan Kaufmann Publishers Inc., 1999, 688–693.
3. Cheng Y , Church GM. Biclustering of Expression Data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, 2000, 93–103.
4. Dhillon IS. Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. San Francisco, California: Association for Computing Machinery, 2001, 269–274.