Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis-Reference-Cited by-同舟云学术

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Published:2020-12 Issue:4 Volume:28 Page:531-561
ISSN:1063-6560
Container-title:Evolutionary Computation
language:en
Short-container-title:Evolutionary Computation

Author:

Lensen Andrew¹,Xue Bing¹,Zhang Mengjie¹

Affiliation:

1. Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand

Abstract

Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.

Publisher

MIT Press - Journals

Subject

Computational Mathematics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/evco_a_00264

Reference47 articles.

1. Data Classification

2. A Genetic Programming Approach to Data Clustering

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dimensionality Reduction for Classification Using Divide-and-Conquer Based Genetic Programming;2024 IEEE Congress on Evolutionary Computation (CEC);2024-06-30

2. A geometric semantic macro-crossover operator for evolutionary feature construction in regression;Genetic Programming and Evolvable Machines;2023-12-08

3. A block padding approach in multidimensional dependency missing data;Engineering Applications of Artificial Intelligence;2023-04

4. Sustainable semantic similarity assessment;Journal of Intelligent & Fuzzy Systems;2022-09-22

5. On genetic programming representations and fitness functions for interpretable dimensionality reduction;Proceedings of the Genetic and Evolutionary Computation Conference;2022-07-08