Zero-Inflated Patent Data Analysis Using Compound Poisson Models-Reference-Cited by-同舟云学术

Zero-Inflated Patent Data Analysis Using Compound Poisson Models

Published:2023-04-02 Issue:7 Volume:13 Page:4505
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Park Sangsung¹^ORCID,Jun Sunghae¹^ORCID

Affiliation:

1. Department of Statistics, Cheongju University, Cheongju 28503, Republic of Korea

Abstract

A large part of big data consists of text documents such as papers, patents or articles. To analyze text data, we have to preprocess the text documents and build a structured data based on a document-word matrix using various text mining techniques. This is because statistics and machine learning algorithms used in text analysis require structured train data. The row and column of the matrix are document and word, respectively. The element of the matrix represents the frequency value of the word occurring in each document. In general, because the number of words is much larger than the number of documents, most elements have zero values. Due to the sparsity problem caused by inflated zeros, the performance of the predictive model has decreased. In this paper, we propose a method to solve the sparsity problem and improve the model performance in text data analysis. We perform compound Poisson linear modeling to make the proposed method. To show the performance of our proposed method, we collect and analyze the patent documents from patent databases. In our experimental results, we compared the value of the Akaike information criterion (AIC) of the proposed model with traditional models, such as linear model, generalized linear model and zero-inflated Poisson model. Additionally, we illustrated that the AIC value of our proposed model is smaller than others. Therefore, we verify the validity of this paper.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/7/4505/pdf

Reference49 articles.

1. Personality Prediction Based on Text Analytics Using Bidirectional Encoder Representations from Transformers from English Twitter Dataset;Arijanto;Int. J. Fuzzy Log. Intell. Syst.,2021

2. Developing a Big Data Analytic Model and a Platform for Particulate Matter Prediction: A Case Study;Kim;Int. J. Fuzzy Log. Intell. Syst.,2019

3. Constructing Efficient Regional Hazardous Weather Prediction Models through Big Data Analysis;Lee;Int. J. Fuzzy Log. Intell. Syst.,2016

4. Automatic Switching of Clustering Methods based on Fuzzy Inference in Bibliographic Big Data Retrieval System;Zolkepli;Int. J. Fuzzy Log. Intell. Syst.,2014

5. Text mining infrastructure in R;Feinerer;J. Stat. Softw.,2008

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining;Stats;2024-08-03

2. Keyword Data Analysis Using Generative Models Based on Statistics and Machine Learning Algorithms;Electronics;2024-02-19

3. Zero-Inflated Text Data Analysis using Generative Adversarial Networks and Statistical Modeling;Computers;2023-12-10