Affiliation:
1. University of California San Diego, USA, and Halıcıoğlu Data Science Institute, USA
2. Bangladesh University of Engineering and Technology (BUET), Bangladesh
Abstract
We propose a variant of Principal Component Analysis (PCA) that is suited for real-time applications. In the real-time version of the PCA problem, we maintain a window over the most recent data and project every incoming row of data into a lower-dimensional subspace, which we generate as the output of the model. The goal is to reduce the reconstruction error of the output from the input and to retain major components pertaining to previous distributions of the data. We use the reconstruction error as the termination criteria to update the eigenspace as new data arrives. We then propose two variants of this algorithm that are progressively more time efficient. To verify whether our proposed model can capture the essence of the changing distribution of large datasets in real time, we have implemented the algorithms and compared performance against carefully designed simulations that change distributions of data sources over time in a controllable manner. Furthermore, we have demonstrated that proposed algorithms can capture the changing distributions of real-life datasets by running simulations on datasets from a variety of real-time applications, e.g., localization, activity recognition, customer expenditure, and so forth. Results show that straightforward modifications to convert PCA to use a sliding window of datasets do not work because of the difficulties associated with determination of optimal window size. Instead, we propose algorithmic enhancements that rely on spectral analysis to improve dimensionality reduction. Results show that our methods can successfully capture the changing distribution of data in a real-time scenario, thus enabling real-time PCA.
Funder
ICT Division Innovation Fund
Publisher
Association for Computing Machinery (ACM)
Reference82 articles.
1. Nuno Abreu Gonçalo Costa and Fernandes Marques. 2011. Analise do Perfil do Cliente Recheio e Desenvolvimento de um Sistema Promocional. Ph.D. Dissertation. Nuno Abreu Gonçalo Costa and Fernandes Marques. 2011. Analise do Perfil do Cliente Recheio e Desenvolvimento de um Sistema Promocional. Ph.D. Dissertation.
2. A framework for diagnosing changes in evolving data streams
3. Comparative study on classifying human activities with miniature inertial and magnetic sensors
4. Cédric Archambeau and Francis R. Bach. 2009. Sparse probabilistic projections. In Advances in Neural Information Processing Systems. 73--80. Cédric Archambeau and Francis R. Bach. 2009. Sparse probabilistic projections. In Advances in Neural Information Processing Systems. 73--80.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Optimal Matrix Sketching over Sliding Windows;Proceedings of the VLDB Endowment;2024-05
2. A Review Paper Comparing the CNN, LBPH, and PCA Face Recognition Algorithms;2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS);2023-11-02
3. Online Component Analysis, Architectures and Applications;Foundations and Trends® in Signal Processing;2022