Abstract
AbstractWord processing entails retrieval of a unitary yet multidimensional semantic representation (e.g.,a lemon’s colour, flavour, possible use) and has been investigated in both cognitive neuroscience and artificial intelligence. To enable the direct comparison of human and artificial semantic representations, and to support the use of natural language processing (NLP) for computational modelling of human understanding, a critical challenge is the development of benchmarks of appropriate size and complexity. Here we present a dataset probing semantic knowledge with a three-terms semantic associative task: which of two target words is more closely associated with a given anchor (e.g.,is lemon closer to squeezer or sour?). The dataset includes both abstract and concrete nouns for a total of 10,107 triplets. For the 2,255 triplets with varying levels of agreement among NLP word embeddings, we additionally collected behavioural similarity judgments from 1,322 human raters. We hope that this openly available, large-scale dataset will be a useful benchmark for both computational and neuroscientific investigations of semantic knowledge.
Funder
postdoctoral fellowship from the Institut de Valorisation des Données (IVADO) and funding from the Courtois NeuroMod Project
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability
Reference63 articles.
1. Borghesani, V. & Piazza, M. The neuro-cognitive representations of symbols: the case of concrete words. Neuropsychologia 105, 4–17 (2017).
2. Howard, D., & Patterson, K. The Pyramids and Palm Trees Test: A test for semantic access from words and pictures. Bury St Edmunds, UK: Thames Valley Test Company (1992).
3. Bak, T. H. & Hodges, J. R. Kissing and dancing—a test to distinguish the lexical and conceptual contributions to noun/verb and action/object dissociation. Journal of Neurolinguistics 16(2-3), 169–181 (2003).
4. Kiela, D. et al. Dynabench: Rethinking benchmarking in NLP. arXiv preprint arXiv:2104.14337 (2021).
5. Agirre, E. et al. A study on similarity and relatedness using distributional and wordnet-based approaches (2009).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献