Building a Turkish UCCA dataset-Reference-Cited by-同舟云学术

Building a Turkish UCCA dataset

Published:2024-08-27 Issue: Volume: Page:1-39
ISSN:2977-0424
Container-title:Natural Language Processing
language:en
Short-container-title:Nat. lang. processing

Author:

Bölücü Necva^ORCID,Can Burcu

Abstract

Abstract Semantic representation is the task of conveying the meaning of a natural language utterance by converting it to a logical form that can be processed and understood by machines. It is utilised by many applications in natural language processing (NLP), particularly in tasks relevant to natural language understanding (NLU). Due to the widespread use of semantic parsing in NLP, many semantic representation schemes with different forms have been proposed; Universal Conceptual Cognitive Annotation (UCCA) is one of them. UCCA is a cross-lingual semantic annotation framework that allows easy annotation without requiring substantial linguistic knowledge. UCCA-annotated datasets have been released so far for English, French, German, Russian, and Hebrew. In this paper, we present a UCCA-annotated Turkish dataset of 400 sentences that are obtained from the METU-Sabanci Turkish Treebank. We provide the UCCA annotation specifications defined for the Turkish language so that it can be extended further. We followed a semi-automatic annotation approach, where an external semantic parser is utilised for the initial annotation of the dataset, which is manually revised by two annotators. We used the same semantic parser model to evaluate the dataset with zero-shot and few-shot learning, demonstrating that even a small sample set from the target language in the training data has a notable impact on the performance of the parser (15.6% and 2.5% gain over zero-shot for labelled and unlabelled results, respectively).

Publisher

Cambridge University Press (CUP)

Reference87 articles.

1. SemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA

2. The ERG at MRP 2019: Radically Compositional Semantic Dependencies

3. Multitask Parsing Across Semantic Representations