PhrasIS: Phrase Inference and Similarity benchmark

Author:

Lopez-Gazpio I1,Gaviria J2,García P2,Sanjurjo-González H2,Sanz B3,Zarranz A3,Maritxalar M3,Agirre E3

Affiliation:

1. TIEC Department , University of Mundaiz kalea 50, Donostia 20012 , Spain , inigo.lopez@ehu.eus

2. TIEC Department , University of Mundaiz kalea 50, Donostia 20012 , Spain

3. HiTZ Basque Center for Language Technologies–Ixa NLP Group , University of the Basque Country (UPV/EHU). M. Lardizabal 1, Donostia 20018 , Spain

Abstract

Abstract We present PhrasIS, a benchmark dataset composed of natural occurring Phrase pairs with Inference and Similarity annotations for the evaluation of semantic representations. The described dataset fills the gap between word and sentence-level datasets, allowing to evaluate compositional models at a finer granularity than sentences. Contrary to other datasets, the phrase pairs are extracted from naturally occurring text in image captions and news headlines. All the text fragments have been annotated by experts following a rigorous process also described in the manuscript achieving high inter annotator agreement. In this work we analyse the dataset, showing the relation between inference labels and similarity scores. With 10K phrase pairs split in development and test, the dataset is an excellent benchmark for testing meaning representation systems.

Publisher

Oxford University Press (OUP)

Reference27 articles.

1. SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability;Agirre,2015

2. Semeval-2016 task 2: interpretable semantic textual similarity;Agirre;Proceedings of SemEval,2016

3. Naturalli: natural logic inference for common sense reasoning;Angeli,2014

4. SICK through the semeval glasses;Bentivogli;Language Resources and Evaluation,2016

5. Europe media monitor—system description;Best,2005

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3