Comparison of sparse biclustering algorithms for gene expression datasets-Reference-Cited by-同舟云学术

Comparison of sparse biclustering algorithms for gene expression datasets

Published:2021-05-06 Issue:6 Volume:22 Page:
ISSN:1467-5463
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Nicholls Kath¹,Wallace Chris¹²

Affiliation:

1. Cambridge Institute for Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, CB2 0AW, UK

2. MRC Biostatistics Unit, Cambridge Biomedical Campus, Forvie Site, Robinson Way, Cambridge, CB2 0SR, UK

Abstract

Abstract Motivation Gene clustering and sample clustering are commonly used to find patterns in gene expression datasets. However, genes may cluster differently in heterogeneous samples (e.g. different tissues or disease states), whilst traditional methods assume that clusters are consistent across samples. Biclustering algorithms aim to solve this issue by performing sample clustering and gene clustering simultaneously. Existing reviews of biclustering algorithms have yet to include a number of more recent algorithms and have based comparisons on simplistic simulated datasets without specific evaluation of biclusters in real datasets, using less robust metrics. Results We compared four classes of sparse biclustering algorithms on a range of simulated and real datasets. All algorithms generally struggled on simulated datasets with a large number of genes or implanted biclusters. We found that Bayesian algorithms with strict sparsity constraints had high accuracy on the simulated datasets and did not require any post-processing, but were considerably slower than other algorithm classes. We found that non-negative matrix factorisation algorithms performed poorly, but could be re-purposed for biclustering through a sparsity-inducing post-processing procedure we introduce; one such algorithm was one of the most highly ranked on real datasets. In a multi-tissue knockout mouse RNA-seq dataset, the algorithms rarely returned clusters containing samples from multiple different tissues, whilst such clusters were identified in a human dataset of more closely related cell types (sorted blood cell subsets). This highlights the need for further thought in the design and analysis of multi-tissue studies to avoid differences between tissues dominating the analysis. Availability Code to run the analysis is available at https://github.com/nichollskc/biclust_comp, including wrappers for each algorithm, implementations of evaluation metrics, and code to simulate datasets and perform pre- and post-processing. The full tables of results are available at https://doi.org/10.5281/zenodo.4581206.

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Link

https://academic.oup.com/bib/article-pdf/22/6/bbab140/41087404/bbab140.pdf

Reference34 articles.

1. Shifting and scaling patterns from gene expression data;Aguilar-Ruiz;Bioinformatics,2005

2. The control of the false discovery rate in multiple testing under dependency;Benjamini;Annal Statist,2001

3. Comparative analysis of biclustering algorithms

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Biclustering data analysis: a comprehensive survey;Briefings in Bioinformatics;2024-05-23

2. PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping;BMC Bioinformatics;2023-12-21

3. Metaheuristic Biclustering Algorithms: From State-of-the-art to Future Opportunities;ACM Computing Surveys;2023-10-06

4. Topological biclustering ARTMAP for identifying within bicluster relationships;Neural Networks;2023-03

5. Data-Driven Evolution Analysis and Trend Prediction of Hotspots in Global PPP Research;Buildings;2023-01-12