Abstract
“Principal Component Analysis” (PCA) is an established linear technique for dimensionality reduction. It performs an orthonormal transformation to replace possibly correlated variables with a smaller set of linearly independent variables, the so-called principal components, which capture a large portion of the data variance. The problem of finding the optimal number of principal components has been widely studied for offline PCA. However, when working with streaming data, the optimal number changes continuously. This requires to update both the principal components and the dimensionality in every timestep. While the continuous update of the principal components is widely studied, the available algorithms for dimensionality adjustment are limited to an increment of one in neural network-based and incremental PCA. Therefore, existing approaches cannot account for abrupt changes in the presented data. The contribution of this work is to enable in neural network-based PCA the continuous dimensionality adjustment by an arbitrary number without the necessity to learn all principal components. A novel algorithm is presented that utilizes several PCA characteristics to adaptivly update the optimal number of principal components for neural network-based PCA. A precise estimation of the required dimensionality reduces the computational effort while ensuring that the desired amount of variance is kept. The computational complexity of the proposed algorithm is investigated and it is benchmarked in an experimental study against other neural network-based and incremental PCA approaches where it produces highly competitive results.
Funder
Ministerium für Wirtschaft, Innovation, Digitalisierung und Energie des Landes Nordrhein-Westfalen
Publisher
Public Library of Science (PLoS)
Reference52 articles.
1. Katal A, Wazid M, Goudar RH. Big data: Issues, challenges, tools and Good practices. IEEE. 2013.
2. Evangelista P, Embrechts M, Szymanski B. Taming the Curse of Dimensionality in Kernels and Novelty Detection. In: Proceedings of the 9th Online World Conference on Soft Computing in Industrial Applications (WSC9). vol. 34; 2004. p. 425–438.
3. Aoying Zhou, Zhiyuan Cai, Li Wei, Weining Qian. M-kernel merging: towards density estimation over data streams. In: Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings.; 2003. p. 285–292.
4. Learning in high-dimensional multimedia data: the state of the art;L Gao;Multimedia Systems,2015
5. Dimensionality reduction: a comparative review;L Van Der Maaten;J Mach Learn Res,2009
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献