Abstract
Many parts of big data, such as web documents, online posts, papers, patents, and articles, are in text form. So, the analysis of text data in the big data domain is an important task. Many methods based on statistics or machine learning algorithms have been studied for text data analysis. Most of them were analytical methods based on the generalized linear model (GLM). For the GLM, text data analysis is performed based on the assumption of the error included in the given data and follows the Gaussian distribution. However, the GLM has shown limitations in the analysis of text data, including data sparseness. This is because the preprocessed text data has a zero-inflated problem. To solve this problem, we proposed a text data analysis using the generalized linear mixed model (GLMM) and Bayesian visualization. Therefore, the objective of our study is to propose the use of GLMM to overcome the limitations of the conventional GLM in the analysis of text data with a zero-inflated problem. The GLMM uses various probability distributions as well as Gaussian for error terms and considers the difference between observations by clustering. We also use Bayesian visualization to find meaningful associations between keywords. Lastly, we carried out the analysis of text data searched from real domains and provided the analytical results to show the performance and validity of our proposed method.
Subject
Geometry and Topology,Logic,Mathematical Physics,Algebra and Number Theory,Analysis
Reference40 articles.
1. Text Data Analysis using Bayesian Quantile Regression and Multidimensional Scaling;Choi;J. Korean Inst. Intell. Syst.,2021
2. Technological cognitive diagnosis model for patent keyword analysis;Park;ICT Express,2020
3. Park, S., and Jun, S. (2020). Patent Keyword Analysis of Disaster Artificial Intelligence Using Bayesian Network Modeling and Factor Analysis. Sustainability, 12.
4. Feinerer, I., and Hornik, K. (2022). Package ‘tm’ Version 0.7-8, Text Mining Package, CRAN of R Project, R Foundation for Statistical Computing.
5. Document Clustering Method Using Dimension Reduction and Support Vector Clustering to Overcome Sparseness;Jun;Expert Syst. Appl.,2014
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献