Band-based similarity indices for gene expression classification and clustering-Reference-Cited by-同舟云学术

Band-based similarity indices for gene expression classification and clustering

Published:2021-11-03 Issue:1 Volume:11 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Torrente Aurora

Abstract

AbstractThe concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.

Funder

Ministerio de Ciencia e Innovación

Comunidad de Madrid

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-021-00678-9.pdf

Reference68 articles.

1. Dudoit, S. & Fridlyand, J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003).

2. Dettling, M. BagBoosting for tumor classification with gene expression data. Bioinformatics 20(18), 3583–3593 (2004).

3. Barutcuoglu, Z., Schapire, R. E. & Troyanskaya, O. G. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830–836 (2006).

4. Grotkjaer, T., Winther, O., Regenberg, B., Nielsen, J. & Hansen, L. K. Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm. Bioinformatics 22(1), 58–67 (2006).

5. Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406(6797), 747–752 (2000).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative analysis of cell line similarity algorithms in oncology treatment;Procedia Computer Science;2024