Relational paraphrase acquisition from Wikipedia: The WRPA method and corpus

Author:

VILA M.,RODRÍGUEZ H.,MARTÍ M. A.

Abstract

AbstractParaphrase corpora are an essential but scarce resource in Natural Language Processing. In this paper, we present the Wikipedia-based Relational Paraphrase Acquisition (WRPA) method, which extracts relational paraphrases from Wikipedia, and the derived WRPA paraphrase corpus. The WRPA corpus currently covers person-related and authorship relations in English and Spanish, respectively, suggesting that, given adequate Wikipedia coverage, our method is independent of the language and the relation addressed. WRPA extracts entity pairs from structured information in Wikipedia applying distant learning and, based on the distributional hypothesis, uses them as anchor points for candidate paraphrase extraction from the free text in the body of Wikipedia articles. Focussing on relational paraphrasing and taking advantage of Wikipedia-structured information allows for an automatic and consistent evaluation of the results. The WRPA corpus characteristics distinguish it from other types of corpora that rely on string similarity or transformation operations. WRPA relies on distributional similarity and is the result of the free use of language outside any reformulation framework. Validation results show a high precision for the corpus.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference45 articles.

1. Wubben S. , van den Bosch A. , and Krahmer E. 2010. Paraphrase generation as monolingual translation: data and evaluation. In Proceedings of the 6th International Language Generation Conference (INLG 2010), pp. 203–7. Dublin: ACL.

2. MICE

3. Sentence compression beyond word deletion

4. Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Urdu Short Paraphrase Detection at Sentence Level;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-04-12

2. ArgRewrite V.2: an annotated argumentative revisions corpus;Language Resources and Evaluation;2022-01-13

3. Creating Paraphrase Identification Corpus for Indian Languages;Handbook of Research on Emerging Trends and Applications of Machine Learning;2020

4. Corpus annotation with paraphrase types: new annotation scheme and inter-annotator agreement measures;Language Resources and Evaluation;2014-07-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3