Semantic Augmentation in Chinese Adversarial Corpus for Discourse Relation Recognition Based on Internal Semantic Elements
-
Published:2024-05-15
Issue:10
Volume:13
Page:1944
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Hua Zheng12ORCID, Yang Ruixia2ORCID, Feng Yanbin2, Yin Xiaojun23
Affiliation:
1. Key Laboratory of Computational Linguistics, Department of Chinese Language and Literature, Peking University, Beijing 100871, China 2. School of Information Science, Beijing Language and Culture University, Beijing 100083, China 3. Research Institute of International Chinese Language Education, Beijing Language and Culture University, Beijing 100083, China
Abstract
This paper proposes incorporating linguistic semantic information into discourse relation recognition and constructing a Semantic Augmented Chinese Discourse Corpus (SACA) comprising 9546 adversative complex sentences. In adversative complex sentences, we suggest a quadruple (P, Q, R, Qβ) representing internal semantic elements, where the semantic opposition between Q and Qβ forms the basis of the adversative relationship. P denotes the premise, and R represents the adversative reason. The overall annotation approach of this corpus follows the Penn Discourse Treebank (PDTB), except for the classification of senses. We combined insights from the Chinese Discourse Treebank (CDTB) and obtained eight sense categories for Chinese adversative complex sentences. Based on this corpus, we explore the relationship between sense classification and internal semantic elements within our newly proposed Chinese Adversative Discourse Relation Recognition (CADRR) task. Leveraging deep learning techniques, we constructed various classification models and the model that utilizes internal semantic element features, demonstrating their effectiveness and the applicability of our SACA corpus. Compared with pre-trained models, our model incorporates internal semantic element information to achieve state-of-the-art performance.
Funder
China Postdoctoral Science Foundation
Reference56 articles.
1. Marcu, D., and Echihabi, A. (2002, January 6–12). An unsupervised approach to recognizing discourse relations. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA. 2. Staliūnaitė, I., Gorinski, P.J., and Iacobacci, I. (2021, January 2–9). Improving commonsense causal reasoning by adversarial training and data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual. 3. A survey on document-level neural machine translation: Methods and evaluation;Maruf;ACM Comput. Surv. CSUR,2021 4. Schick, T., and Schütze, H. (2020). It’s not just size that matters: Small language models are also few-shot learners. arXiv. 5. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A.K., and Webber, B.L. (June, January 26). The Penn Discourse TreeBank 2.0. Proceedings of the LREC, Marrakech, Morocco.
|
|