Affiliation:
1. a Atmospheric and Oceanic Sciences, University of Wisconsin–Madison, Madison, Wisconsin
Abstract
Abstract
A simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, nonoverlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, although clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two nontrivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well suited for isolating rare or anomalous members of a dataset. The method is inductive in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.
Funder
National Aeronautics and Space Administration
Publisher
American Meteorological Society
Reference32 articles.
1. State-of-the-art in artificial neural network applications: A survey;Abiodun, O. I.,2018
2. A tool to estimate land-surface emissivities at microwave frequencies (TELSEM) for use in numerical weather prediction;Aires, F.,2011
3. Random forest in remote sensing: A review of applications and future directions;Belgiu, M.,2016
4. Discrete Bayesian network classifiers: A survey;Bielza, C.,2014
5. A review of modern approaches to classification of remote sensing data;Bruzzone, L.,2014