Semantic Augmentation in Chinese Adversarial Corpus for Discourse Relation Recognition Based on Internal Semantic Elements-Reference-Cited by-同舟云学术

Semantic Augmentation in Chinese Adversarial Corpus for Discourse Relation Recognition Based on Internal Semantic Elements

Published:2024-05-15 Issue:10 Volume:13 Page:1944
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Hua Zheng¹²^ORCID,Yang Ruixia²^ORCID,Feng Yanbin²,Yin Xiaojun²³

Affiliation:

1. Key Laboratory of Computational Linguistics, Department of Chinese Language and Literature, Peking University, Beijing 100871, China

2. School of Information Science, Beijing Language and Culture University, Beijing 100083, China

3. Research Institute of International Chinese Language Education, Beijing Language and Culture University, Beijing 100083, China

Abstract

This paper proposes incorporating linguistic semantic information into discourse relation recognition and constructing a Semantic Augmented Chinese Discourse Corpus (SACA) comprising 9546 adversative complex sentences. In adversative complex sentences, we suggest a quadruple (P, Q, R, Qβ) representing internal semantic elements, where the semantic opposition between Q and Qβ forms the basis of the adversative relationship. P denotes the premise, and R represents the adversative reason. The overall annotation approach of this corpus follows the Penn Discourse Treebank (PDTB), except for the classification of senses. We combined insights from the Chinese Discourse Treebank (CDTB) and obtained eight sense categories for Chinese adversative complex sentences. Based on this corpus, we explore the relationship between sense classification and internal semantic elements within our newly proposed Chinese Adversative Discourse Relation Recognition (CADRR) task. Leveraging deep learning techniques, we constructed various classification models and the model that utilizes internal semantic element features, demonstrating their effectiveness and the applicability of our SACA corpus. Compared with pre-trained models, our model incorporates internal semantic element information to achieve state-of-the-art performance.

Funder

China Postdoctoral Science Foundation

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/10/1944/pdf

Reference56 articles.

1. Marcu, D., and Echihabi, A. (2002, January 6–12). An unsupervised approach to recognizing discourse relations. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA.

2. Staliūnaitė, I., Gorinski, P.J., and Iacobacci, I. (2021, January 2–9). Improving commonsense causal reasoning by adversarial training and data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual.

3. A survey on document-level neural machine translation: Methods and evaluation;Maruf;ACM Comput. Surv. CSUR,2021

4. Schick, T., and Schütze, H. (2020). It’s not just size that matters: Small language models are also few-shot learners. arXiv.

5. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A.K., and Webber, B.L. (June, January 26). The Penn Discourse TreeBank 2.0. Proceedings of the LREC, Marrakech, Morocco.