Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model

Author:

van der Lee Chris1,Ferreira Thiago Castro2,Emmery Chris3,Wiltshire Travis J.4,Krahmer Emiel5

Affiliation:

1. Tilburg University , Tilburg Center for Cognition and Communication. c.vdrlee@uvt.nl

2. Universidade Federal de Minas Gerais , Faculdade de Letras. thiagocf05@ufmg.br

3. Tilburg University , Department of Cognitive Science and Artificial Intelligence. c.d.emmery@uvt.nl

4. Tilburg University , Department of Cognitive Science and Artificial Intelligence. t.j.wiltshire@uvt.nl

5. Tilburg University , Tilburg Center for Cognition and Communication. e.j.krahmer@uvt.nl

Abstract

AbstractThis study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text systems that are additionally enriched by a data augmentation or a pseudo-labeling semi-supervised learning approach.Results show that semi-supervised learning results in higher scores on diversity metrics. In terms of output quality, extending the training set of a data-to-text system with a language model using the pseudo-labeling approach did increase text quality scores, but the data augmentation approach yielded similar scores to the system without training set extension. These results indicate that semi-supervised learning approaches can bolster output quality and diversity, even when a language model is also present.

Publisher

MIT Press

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Reference86 articles.

1. Machine translation aided bilingual data-to-text generation and semantic parsing;Agarwal,2020

2. Unsupervised matching of data and text;Ahmadi;arXiv preprint arXiv:2112.08776,2021

3. Synthetic QA corpora generation with roundtrip consistency;Alberti,2019

4. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments;Banerjee,2005

5. Fitting linear mixed-effects models using lme4;Bates;Journal of Statistical Software,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3