Affiliation:
1. Bauman Moscow State Technical University
2. National Research University Higher School of Economics
Abstract
The paper considers the problem of reducing multidimensional correlated indicators. One of the approaches to solving this problem is based on the method of principal components, which makes it possible to compactly describe the vector with correlated coordinates (components) using the principal components vector with uncorrelated coordinates of much smaller dimension, while retaining most of the information about correlation structure of the original vector. On simulated and real data, several modifications of the principal components method were compared differing in the method of evaluating correlation matrix of the observation vector. The work objective is to demonstrate advantages of the robust modifications of the principal components method in cases, where data contained the abnormal values. To compare the considered modifications on the model data, metric was introduced that measured the difference between estimated and true eigenvalues of the initial data correlation matrix. This metric behavior depending on the probability distribution of observations was studied by computer simulation. As the distributions, multivariate distributions with the off-diagonal correlation matrices simulating a polluted sample were selected. Next, a sample of 13 correlated socioeconomic indicators for 85 countries was considered, where 46 abnormal values were identified. The considered modifications of the principal components method chose the same optimal number of principal components equal to three. However, the real data compression quality, which was defined as the share of the initial indicators total variance described by the first three principal components, turned out to be significantly higher for the robust modifications of the principal components method. Results obtained on these real data are in good agreement with conclusions of the computer simulation
Publisher
Bauman Moscow State Technical University
Subject
General Physics and Astronomy,General Engineering,General Mathematics,General Chemistry,General Computer Science
Reference16 articles.
1. Ayvazyan S.A., ed. Prikladnaya statistika. Klassifikatsiya i snizhenie razmernosti [Applied statistics. Classification and dimension reduction]. Moscow, Finansy i statistika Publ., 1989.
2. Jolliffe I.T. Principal component analysis. Springer Series in Statistics. New York, NY, Springer, 2002. DOI: https://doi.org/10.1007/b98835
3. Huber P.J., Ronchetti E.M. Robust statistics. Wiley, 2009.
4. Olive D.J. Robust multivariate analysis. Cham, Springer, 2017. DOI: https://doi.org/10.1007/978-3-319-68253-2
5. Goryainov V.B., Goryainova E.R. Comparative analysis of robust modification quality for principal component analysis to perform correlated data compression. Herald of the Bauman Moscow State Technical University, Series Natural Sciences, 2021, no. 3 (96), pp. 23--45 (in Russ.). DOI: https://doi.org/10.18698/1812-3368-2021-3-23-45