Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Author:

Lensen Andrew1,Xue Bing1,Zhang Mengjie1

Affiliation:

1. Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand

Abstract

Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.

Publisher

MIT Press - Journals

Subject

Computational Mathematics

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Dimensionality Reduction for Classification Using Divide-and-Conquer Based Genetic Programming;2024 IEEE Congress on Evolutionary Computation (CEC);2024-06-30

2. A geometric semantic macro-crossover operator for evolutionary feature construction in regression;Genetic Programming and Evolvable Machines;2023-12-08

3. A block padding approach in multidimensional dependency missing data;Engineering Applications of Artificial Intelligence;2023-04

4. Sustainable semantic similarity assessment;Journal of Intelligent & Fuzzy Systems;2022-09-22

5. On genetic programming representations and fitness functions for interpretable dimensionality reduction;Proceedings of the Genetic and Evolutionary Computation Conference;2022-07-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3