GenClust: A genetic algorithm for clustering gene expression data

Author:

Di Gesú Vito,Giancarlo Raffaele,Lo Bosco Giosué,Raimondi Alessandra,Scaturro Davide

Abstract

Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Reference31 articles.

1. Stanford Microarray DataBase[http://genome-www5.stanford.edu/]

2. Everitt B: Cluster Analysis. London: Edward Arnold; 1993.

3. Hansen P, Jaumard P: Cluster analysis and mathematical programming. Mathematical Programming 1997, 79: 191–215. 10.1016/S0025-5610(97)00059-2

4. Hartigan J: Clustering Algorithms. John Wiley and Sons; 1975.

5. Jain AK, Murty MN, Flynn PJ: Data clustering: a Review. ACM Computing Surveys 1999, 31(3):264–323. 10.1145/331499.331504

Cited by 28 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. HSGS: A hybrid of harmony search algorithm and golden section for data clustering;Expert Systems with Applications;2023-08

2. SGAClust: Semi-supervised Graph Attraction Clustering of gene expression data;Network Modeling Analysis in Health Informatics and Bioinformatics;2022-06-21

3. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks;Statistical Science;2021-02-01

4. A multidisciplinary ensemble algorithm for clustering heterogeneous datasets;Neural Computing and Applications;2021-01-02

5. Clustering algorithms;Computational Learning Approaches to Data Analytics in Biomedical Applications;2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3