Ensemble method for cluster number determination and algorithm selection in unsupervised learning-Reference-Cited by-同舟云学术

Ensemble method for cluster number determination and algorithm selection in unsupervised learning

Published:2022-05-25 Issue: Volume:11 Page:573
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Zambelli Antoine^ORCID

Abstract

Unsupervised learning, and more specifically clustering, suffers from the need for expertise in the field to be of use. Researchers must make careful and informed decisions on which algorithm to use with which set of hyperparameters for a given dataset. Additionally, researchers may need to determine the number of clusters in the dataset, which is unfortunately itself an input to most clustering algorithms; all of this before embarking on their actual subject matter work. After quantifying the impact of algorithm and hyperparameter selection, we propose an ensemble clustering framework which can be leveraged with minimal input. It can be used to determine both the number of clusters in the dataset and a suitable choice of algorithm to use for a given dataset. A code library is included in the Conclusions for ease of integration.

Publisher

F1000 Research Ltd

Subject

General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

https://f1000research.com/articles/11-573/v1/pdf

Reference14 articles.

1. Detecting shared genetic architecture among multiple phenotypes by hierarchical clustering of gene-level association statistics.;M McGuirl;Genetics.,06 2020

2. An enhanced clustering-based method for determining time-of-day breakpoints through process optimization.;X Song;IEEE Access.,2018

3. Machine learning in the analysis of social problems: The case of global human trafficking.;A Caoli;The British University in Dubai, (Dissertation).,2019

4. Scikit-learn: Machine learning in Python.;F Pedregosa;J. Mach. Learn. Res.,2011

5. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python.;D Müllner;J. Stat. Softw.,2013