Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks-Reference-Cited by-同舟云学术

Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks

Published:2022-12-27 Issue:2 Volume:22 Page:1-27
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Demir Seniz¹^ORCID

Affiliation:

1. Department of Computer Engineering, MEF University, Istanbul, Turkey

Abstract

End-to-end data-driven approaches lead to rapid development of language generation and dialogue systems. Despite the need for large amounts of well-organized data, these approaches jointly learn multiple components of the traditional generation pipeline without requiring costly human intervention. End-to-end approaches also enable the use of loosely aligned parallel datasets in system development by relaxing the degree of semantic correspondences between training data representations and text spans. However, their potential in Turkish language generation has not yet been fully exploited. In this work, we apply sequence-to-sequence (Seq2Seq) neural models to Turkish data-to-text generation where the input data given in the form of a meaning representation is verbalized. We explore encoder-decoder architectures with attention mechanism in unidirectional, bidirectional, and stacked recurrent neural network (RNN) models. Our models generate one-sentence biographies and dining venue descriptions using a crowdsourced dataset where all field value pairs that appear in meaning representations are fully captured in reference sentences. To support this work, we also explore the performances of our models on a more challenging dataset, where the content of a meaning representation is too large to fit into a single sentence, and hence content selection and surface realization need to be learned jointly. This dataset is retrieved by coupling introductory sentences of person-related Turkish Wikipedia articles with their contained infobox tables. Our empirical experiments on both datasets demonstrate that Seq2Seq models are capable of generating coherent and fluent biographies and venue descriptions from field value pairs. We argue that the wealth of knowledge residing in our datasets and the insights obtained from this study hold the potential to give rise to the development of new end-to-end generation approaches for Turkish and other morphologically rich languages.

Funder

TUBITAK-ARDEB

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3543826

Reference103 articles.

1. Burcu Karagol Ayan. 2000. Morphosyntactic generation of Turkish from predicate-argument structure. In Proceedings of the COLING Student Session. Association for Computational Linguistics, Saarbrucken, Germany.

2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations. OpenReview.net, San Diego, California.

3. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65–72.

4. Collective content selection for concept-to-text generation

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DRL-based dependent task offloading with delay-energy tradeoff in medical image edge computing;Complex & Intelligent Systems;2024-01-29