Author:
Camiz Sergio,Pillar Valério
Abstract
The identification of a reduced dimensional representation of the data is among the main issues of exploratory multidimensional data analysis and several solutions had been proposed in the literature according to the method. Principal Component Analysis (PCA) is the method that has received the largest attention thus far and several identification methods—the so-called stopping rules—have been proposed, giving very different results in practice, and some comparative study has been carried out. Some inconsistencies in the previous studies led us to try to fix the distinction between signal from noise in PCA—and its limits—and propose a new testing method. This consists in the production of simulated data according to a predefined eigenvalues structure, including zero-eigenvalues. From random populations built according to several such structures, reduced-size samples were extracted and to them different levels of random normal noise were added. This controlled introduction of noise allows a clear distinction between expected signal and noise, the latter relegated to the non-zero eigenvalues in the samples corresponding to zero ones in the population. With this new method, we tested the performance of ten different stopping rules. Of every method, for every structure and every noise, both power (the ability to correctly identify the expected dimension) and type-I error (the detection of a dimension composed only by noise) have been measured, by counting the relative frequencies in which the smallest non-zero eigenvalue in the population was recognized as signal in the samples and that in which the largest zero-eigenvalue was recognized as noise, respectively. This way, the behaviour of the examined methods is clear and their comparison/evaluation is possible. The reported results show that both the generalization of the Bartlett’s test by Rencher and the Bootstrap method by Pillar result much better than all others: both are accounted for reasonable power, decreasing with noise, and very good type-I error. Thus, more than the others, these methods deserve being adopted.
Funder
Conselho Nacional de Desenvolvimento Científico e Tecnológico
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference53 articles.
1. Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data
2. Principal Component Analysis;Jolliffe,2002
3. Methods of Multivariate Analysis;Rencher,2002
4. Statistique Exploratoire Multidimensionnelle—Visualisation et Inférence en Fouilles de Données;Lebart,2016
5. Some necessary conditions for common-factor analysis
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献