Coral: an integrated suite of visualizations for comparing clusterings-Reference-Cited by-同舟云学术

Coral: an integrated suite of visualizations for comparing clusterings

Published:2012-10-29 Issue:1 Volume:13 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Filippova Darya,Gadani Aashish,Kingsford Carl

Abstract

Abstract Background Clustering has become a standard analysis for many types of biological data (e.g interaction networks, gene expression, metagenomic abundance). In practice, it is possible to obtain a large number of contradictory clusterings by varying which clustering algorithm is used, which data attributes are considered, how algorithmic parameters are set, and which near-optimal clusterings are chosen. It is a difficult task to sift though such a large collection of varied clusterings to determine which clustering features are affected by parameter settings or are artifacts of particular algorithms and which represent meaningful patterns. Knowing which items are often clustered together helps to improve our understanding of the underlying data and to increase our confidence about generated modules. Results We present Coral, an application for interactive exploration of large ensembles of clusterings. Coral makes all-to-all clustering comparison easy, supports exploration of individual clusterings, allows tracking modules across clusterings, and supports identification of core and peripheral items in modules. We discuss how each visual component in Coral tackles a specific question related to clustering comparison and provide examples of their use. We also show how Coral could be used to visually and quantitatively compare clusterings with a ground truth clustering. Conclusion As a case study, we compare clusterings of a recently published protein interaction network of Arabidopsis thaliana. We use several popular algorithms to generate the network’s clusterings. We find that the clusterings vary significantly and that few proteins are consistently co-clustered in all clusterings. This is evidence that several clusterings should typically be considered when evaluating modules of genes, proteins, or sequences, and Coral can be used to perform a comprehensive analysis of these clustering ensembles.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-13-276.pdf

Reference43 articles.

1. Sharan R, Ulitsky I: Network-based prediction of protein function. Mol Syst Biol 2007, 3: 88.

2. Ulitsky I, Maron-Katz A, Shavit S, Sagir D, Linhart C, Elkon R, Tanay A, Sharan R, Shiloh Y, Shamir R: Expander: from expression microarrays to networks and functions. Nat Protoc 2010, 5(2):303–322. 10.1038/nprot.2009.230

3. Chatterji S, Yamazaki I, Bai Z, Eisen JA: CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads. Tech. rep., arXiv 2007 Tech. rep., arXiv 2007

4. White JR, Navlakha S, Nagarajan N, Ghodsi MR, Kingsford C, Pop M: Alignment and clustering of phylogenetic markers — implications for microbial diversity studies. BMC Bioinf 2010, 11: 152. 10.1186/1471-2105-11-152

5. van Dongen S: Graph clustering by flow simulation. PhD thesis. University of Utrecht, 2000 University of Utrecht, 2000

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Interactive exploration of large time-dependent bipartite graphs;Journal of Computer Languages;2020-04

2. Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization;PLOS ONE;2019-10-10

3. Task-Driven Comparison of Topic Models;IEEE Transactions on Visualization and Computer Graphics;2016-01-31

4. Generalist Species Have a Central Role In a Highly Diverse Plant-Frugivore Network;Biotropica;2016-01-28

5. Data Visualization and Structure Identification;Information Science for Materials Discovery and Design;2015-12-13