A simple clustering technique to extract subsets of data for function approximation-Reference-Cited by-同舟云学术

A simple clustering technique to extract subsets of data for function approximation

Published:2015-04-28 Issue:5 Volume:17 Page:719-732
ISSN:1464-7141
Container-title:Journal of Hydroinformatics
language:en
Short-container-title:

Author:

Karunasingha Dulakshi Santhusitha Kumari¹,Liong Shie-Yui²

Affiliation:

1. Department of Engineering Mathematics, Faculty of Engineering, University of Peradeniya, Peradeniya, Sri Lanka

2. Tropical Marine Science Institute, National University of Singapore, Singapore 119223, Singapore

Abstract

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.

Publisher

IWA Publishing

Subject

Atmospheric Science,Geotechnical Engineering and Engineering Geology,Civil and Structural Engineering,Water Science and Technology

Link

http://iwaponline.com/jh/article-pdf/17/5/719/388062/jh0170719.pdf

Reference38 articles.

1. Analysis of Observed Chaotic Data

2. Optimized fixed-size kernel models for large data sets;Brabanter;Computational Statistics & Data Analysis,2010

3. Fuzzy model identification based on cluster estimation;Chiu;Journal of Intelligent and Fuzzy Systems,1994

4. Derivation of effective and efficient data set with subtractive clustering method and genetic algorithm;Doan;Journal of Hydroinformatics,2005

5. Multivariate adaptive regression splines;Friedman;Annals of Statistics,1991

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Self-organizing map clustering technique for ANN-based spatiotemporal modeling of groundwater quality parameters;Journal of Hydroinformatics;2015-09-09