Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective-Reference-Cited by-同舟云学术

Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective

Published:2022-02-03 Issue:2 Volume:13 Page:163-194
ISSN:2210-4968
Container-title:Semantic Web
language:
Short-container-title:SW

Author:

Kaffee Lucie-Aimée¹,Vougiouklis Pavlos²,Simperl Elena³

Affiliation:

1. School of Electronics and Computer Science, University of Southampton, UK. E-mail: kaffee@soton.ac.uk

2. Huawei Technologies, UK. E-mail: pavlos.vougiouklis@huawei.com

3. King’s College London, UK. E-mail: elena.simperl@kcl.ac.uk

Abstract

Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in 14 under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate an introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency.

Publisher

IOS Press

Subject

Computer Networks and Communications,Computer Science Applications,Information Systems

Reference86 articles.

1. Assigning trust to Wikipedia content

2. A content-driven reputation system for the wikipedia

3. Reciprocal Enrichment Between Basque Wikipedia and Machine Translation

4. G. Angeli, P. Liang and D. Klein, A simple domain-independent probabilistic approach to generation, in: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10, Association for Computational Linguistics, Stroudsburg, PA, USA, 2010, pp. 502–512, http://dl.acm.org/citation.cfm?id=1870658.1870707.

5. Readers are not free-riders

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Detecting Cross-Lingual Information Gaps in Wikipedia;Companion Proceedings of the ACM Web Conference 2023;2023-04-30

2. What Really Matters in a Table? Insights from a User Study;2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT);2022-11