Abstract
AbstractHigh dimensional data are rapidly growing in many different disciplines, particularly in natural language processing. The analysis of natural language processing requires working with high dimensional matrices of word embeddings obtained from text data. Those matrices are often sparse in the sense that they contain many zero elements. Sparse principal component analysis is an advanced mathematical tool for the analysis of high dimensional data. In this paper, we study and apply the sparse principal component analysis for natural language processing, which can effectively handle large sparse matrices. We study several formulations for sparse principal component analysis, together with algorithms for implementing those formulations. Our work is motivated and illustrated by a real text dataset. We find that the sparse principal component analysis performs as good as the ordinary principal component analysis in terms of accuracy and precision, while it shows two major advantages: faster calculations and easier interpretation of the principal components. These advantages are very helpful especially in big data situations.
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Statistics, Probability and Uncertainty,Computer Science Applications,Business, Management and Accounting (miscellaneous)
Reference25 articles.
1. Sirimongkolkasem T, Drikvandi R (2019) On regularisation methods for analysis of high dimensional data. Ann Data Sci 6(4):737–763
2. Collobert R (2014) Word embeddings through hellinger PCA. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics
3. Spruyt V (2014) The curse of dimensionality in classification. https://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification. Accessed 16 Apr 2014
4. Aggarwal CC, Zhai C (2012) Mining text data. Springer, New York
5. Ning-min S, Jing L (2015) A literature survey on high-dimensional sparse principal component analysis. Int J Datab Theory Appl 8(6):57–74
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献