Affiliation:
1. Distributed Computing Systems, Belfast, UK
Abstract
This paper presents a clustering algorithm that is an extension of the Category Trees algorithm. Category Trees is a clustering method that creates tree structures that branch on category type and not feature. The development in this paper is to consider a secondary order of clustering that is not the category to which the data row belongs, but the tree, representing a single classifier, that it is eventually clustered with. Each tree branches to store subsets of other categories, but the rows in those subsets may also be related. This paper is therefore concerned with looking at that second level of clustering between the category subsets, to try to determine if there is any consistency over it. It is argued that Principal Components may be a related and reciprocal type of structure, and there is an even bigger question about the relation between exemplars and principal components, in general. The theory is demonstrated using the Portugal Forest Fires dataset as a case study. The Category Trees are then combined with other Self-Organising algorithms from the author and it is suggested that they all belong to the same family type, which is an Entropy-style of classifier. Some analysis of classifier types is also presented.
Publisher
World Scientific and Engineering Academy and Society (WSEAS)
Reference25 articles.
1. Greer, K. (2018). An Improved Oscillating-Error Classifier with Branching, WSEAS Transactions on Computer Research, Vol. 6, pp. 49 - 54. E-ISSN: 2415-1521. For the updated version, see Category Trees (2020), available on arXiv at https://arxiv.org/abs/1811.02617.
2. Wold, S., Esbensen, K. and Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, Vol. 2, No. 1-3, pp. 37-52.
3. Cortez, P. and Morais, A. (2007). A Data Mining Approach to Predict Forest Fires using Meteorological Data. In J. Neves, M. F. Santos and J. Machado Eds., New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, December, Guimaraes, Portugal, pp. 512-523. APPIA, ISBN-13 978-989-95618-0-9.
4. El Nino Dataset, http://archive.ics.uci.edu/ml/ datasets/El+Nino.
5. Bay, S.D., Kibler, D.F., Pazzani, M.J. and Smyth, P. (2000). The UCI KDD Archive of Large Data Sets for Data Mining Research and Experimentation. SIGKDD Explorations, 2.