Experimental Study of Morphological Analyzers for Topic Categorization in News Articles-Reference-Cited by-同舟云学术

Experimental Study of Morphological Analyzers for Topic Categorization in News Articles

Published:2023-09-22 Issue:19 Volume:13 Page:10572
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Ahn Sangtae¹²^ORCID

Affiliation:

1. School of Electronics Engineering, Kyungpook National University, Daegu 41566, Republic of Korea

2. School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Republic of Korea

Abstract

Natural language processing refers to the ability of computers to understand text and spoken words similar to humans. Recently, various machine learning techniques have been used to encode a large amount of text and decode feature vectors of text successfully. However, understanding low-resource languages is in the early stages of research. In particular, Korean, which is an agglutinative language, needs sophisticated preprocessing steps, such as morphological analysis. Since morphological analysis in preprocessing significantly influences classification results, ideal and optimized morphological analyzers must be used. This study explored five state-of-the-art morphological analyzers for Korean news articles and categorized their topics into seven classes using term frequency–inverse document frequency and light gradient boosting machine frameworks. It was found that a morphological analyzer based on unsupervised learning achieved a computation time of 6 s in 500,899 tokens, which is 72 times faster than the slowest analyzer (432 s). In addition, a morphological analyzer using dynamic programming achieved a topic categorization accuracy of 82.5%, which is 9.4% higher than achieve when using the hidden Markov model (73.1%) and 13.4% higher compared to the baseline (69.1%) without any morphological analyzer in news articles. This study can provide insight into how each morphological analyzer extracts morphemes in sentences and affects categorizing topics in news articles.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/19/10572/pdf

Reference32 articles.

1. Natural Language Processing;Chowdhury;Annu. Rev. Inf. Sci. Technol.,2003

2. Jones, K.S. (1994). Current Issues in Computational Linguistics: In Honour of Don Walker, Springer.

3. A Primer on Neural Network Models for Natural Language Processing;Goldberg;J. Artif. Intell. Res.,2015

4. Ramos, J. (2003, January 23–24). Using TF-IDF to Determine Word Relevance in Document Queries. Proceedings of the First Instructional Conference on Machine Learning, Los Angeles, CA, USA.

5. Greedy Function Approximation: A Gradient Boosting Machine;Friedman;Ann. Statist.,2001

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automated Scoring of Translations with BERT Models: Chinese and English Language Case Study;Applied Sciences;2024-02-26

2. An Artificial-Intelligence-Driven Spanish Poetry Classification Framework;Big Data and Cognitive Computing;2023-12-14