Naive Bayesian Prediction of Japanese Annotated Corpus for Textual Semantic Word Formation Classification

Author:

Hao Zhoushao1ORCID

Affiliation:

1. Luoyang Normal University, Luoyang Henan 471934, China

Abstract

With the rapid development of Japanese information processing technology, problems such as polysemy and ambiguity at the text and dialogue level, as well as unregistered words, have become increasingly prominent because computers cannot fully “understand” the semantics of words. How to make the computer “understand” the semantics of words accurately requires the computer to “understand” the rules of converting and integrating words into words from the perspective of semantics. Traditional Japanese text classification mostly adopts the text representation method of vector space model, which has the problem of confusing classification effect. Therefore, this paper proposes the topic of constructing a semantic word formation pattern prediction model based on a large-scale annotated corpus. This paper proposes a solution that combines Japanese semantic word formation rules with pattern recognition algorithms. Aiming at this scheme, a variety of pattern recognition algorithms were compared and analyzed, and the naive Bayesian model was decided to predict semantic word formation patterns. This paper further improves the accuracy of computer prediction of Japanese semantic word formation patterns by adding part of speech. Before modeling, the parts of speech of words are automatically tagged and manually checked based on the original annotated corpus. In the research on predicting Japanese semantic word formation patterns, this paper builds a semantic word formation pattern prediction model based on Naive Bayes and conducts simulation experiments. We divide the eight types of semantic word formation patterns in the annotated corpus into two groups, and divide the obtained sample sets into training sets and test sets, so that the Naive Bayes model first learns semantic word formation rules based on the training sets of each group. Semantic word formation patterns are predicted on the test set for each group. The simulation results show that the prediction model of semantic word formation mode has a generally high degree of fit and prediction accuracy. The prediction model of semantic word formation pattern based on this theory can ensure that the computer can judge the semantic word formation pattern more accurately.

Publisher

Hindawi Limited

Subject

General Engineering,General Mathematics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3