Affiliation:
1. Department of Computer Science, School of Science, Loughborough University, UK
2. Department of Information Engineering, Faculty of Engineering, The Chinese University of Hong Kong, China
Abstract
Classical Chinese poetry, as an essential aspect of cultural heritage, exhibits rich theme diversity often overlooked in natural language processing research. To address this gap, we aim to explore the classification of thematic categories within this literary domain. We curate a dataset of 2,918 annotated poems spanning seven common themes and propose a BERT-based ensemble learning approach for effective classification. Although this method integrates existing models, it achieves an accuracy and F1 score of over 72% in the 7-class task, surpassing established baselines, and providing a baseline for future research. The experimental findings reveal the effectiveness of ensemble strategies in improving individual base model performance and highlight the potential of the MLP-based ensemble technique. The study contributes to a deeper understanding of thematic categories and textual features in classical Chinese poetry, and offers an automated classification system for classical Chinese poems.
Publisher
Association for Computing Machinery (ACM)
Reference55 articles.
1. LDA-Transformer Model in Chinese Poetry Authorship Attribution
2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473
3. Bagging predictors
4. An ensemble model for classifying idioms and literal texts using BERT and RoBERTa
5. Zong-Qi Cai. 2018. How to read Chinese poetry: A guided anthology. Columbia University Press.