Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples-Reference-Cited by-同舟云学术

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Published:2022-07-16 Issue:7 Volume:14 Page:211
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Uhm Daiho,Jun Sunghae^ORCID

Abstract

Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big data analysis using statistics and machine learning algorithms. However, as the use of big data increases, problems also occur. One of them is a zero-inflated problem in structured data preprocessed from big data. Most count values are zeros because a specific word is found in only some documents. In particular, since most of the patent data are in the form of a text document, they are more affected by the zero-inflated problem. To solve this problem, we propose a generation of synthetic samples using statistical inference and tree structure. Using patent document and simulation data, we verify the performance and validity of our proposed method. In this paper, we focus on patent keyword analysis as text big data analysis, and we encounter the zero-inflated problem just like other text data.

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/14/7/211/pdf

Reference33 articles.

1. Regression Analysis of Count Data;Cameron,2013

2. Zero-Inflated Poisson and Negative Binomial Regressions for Technology Analysis

3. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

4. Modeling Overdispersion, Autocorrelation, and Zero-Inflated Count Data Via Generalized Additive Models and Bayesian Statistics in an Aphid Population Study

5. Negative Binomial Regression;Hilbe,2011

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Keyword Data Analysis Using Generative Models Based on Statistics and Machine Learning Algorithms;Electronics;2024-02-19

2. Estimation of Uncertainty for Technology Evaluation Factors via Bayesian Neural Networks;Axioms;2023-01-31

3. Text Data Analysis Using Generalized Linear Mixed Model and Bayesian Visualization;Axioms;2022-11-26