Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification-Reference-Cited by-同舟云学术

Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification

Published:2023-07-28 Issue:7 Volume:18 Page:e0289204
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Hou Jingrui^ORCID,Wang Ping^ORCID

Abstract

As hieroglyphic languages, such as Chinese, differ from alphabetic languages, researchers have always been interested in using internal glyph features to enhance semantic representation. However, the models used in such studies are becoming increasingly computationally expensive, even for simple tasks like text classification. In this paper, we aim to balance model performance and computation cost in glyph-aware Chinese text classification tasks. To address this issue, we propose a lightweight ensemble learning method for glyph-aware Chinese text classification (LEGACT) that consists of typical shallow networks as base learners and machine learning classifiers as meta-learners. Through model design and a series of experiments, we demonstrate that an ensemble approach integrating shallow neural networks can achieve comparable results even when compared to large-scale transformer models. The contribution of this paper includes a lightweight yet powerful solution for glyph-aware Chinese text classification and empirical evidence of the significance of glyph features for hieroglyphic language representation. Moreover, this paper emphasizes the importance of assembling shallow neural networks with proper ensemble strategies to reduce computational workload in predictive tasks.

Funder

National Natural Science Foundation of China

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference54 articles.

1. A survey on text classification: Practical perspectives on the Italian language;A Gasparetto;PLOS ONE,2022

2. Chen X, Xu L, Liu Z, Sun M, Luan H. Joint learning of character and word embeddings. In: Proceedings of the 24th International Conference on Artificial Intelligence. IJCAI’15. AAAI Press; 2015. p. 1236–1242. Available from: https://dl.acm.org/doi/10.5555/2832415.2832421.

3. Radical Enhanced Chinese Word Embedding

4. Shi X, Zhai J, Yang X, Xie Z, Liu C. Radical embedding: Delving deeper to Chinese radicals. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Beijing, China: Association for Computational Linguistics; 2015. p. 594–598. Available from: https://aclanthology.org/P15-2098.

5. Li Y, Li W, Sun F, Li S. Component-enhanced Chinese character embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics; 2015. p. 829–834. Available from: https://aclanthology.org/D15-1098.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Thematic Diversity in Classical Chinese Poetry: A Novel Dataset and a BERT-enhanced Ensemble Learning Approach;Journal on Computing and Cultural Heritage;2024-08-07

2. Big Data in Art History: Exploring the Evolution of Dunhuang Artistic Style Through Archaeological Evidence;MEDITERR ARCHAEOL AR;2023