Author:
Ali Shallaw Mohammed,Kecskemeti Gabor
Abstract
AbstractClustering is widely used in cloud computing studies to extract vital information. These studies have ignored investigating the potential improvements in clustering quality from better selection of its dimensions and methods. Consequently, developing an automated technique to perform such a selection was not addressed thoroughly. Most of the recent attempts either relied on feature reduction or general non-automated techniques, which were deemed unreliable for sufficient selection. Therefore, we first conducted a comprehensive investigation to study the impact of selecting better clustering dimensions and methods. Our results indicate achieving significant improvement by 15–70% points through better selection. Then, we developed a novel technique (EFection) to detect the best selection in advance using a combination of internal validation metrics (Davies–Bouldin) and the Pearson correlation coefficient. We evaluate our technique’s accuracy by comparing the clustering quality of its suggestions with that of the optimal selection. We then compare EFection’s performance with recent attempts to measure its superiority. Finally, we validate its applicability when adopted in cloud clustering-based studies. The results show that EFection offers high accuracy, around 83%, and surpasses prior art by 11%.
Publisher
Springer Science and Business Media LLC