Decoding Word Embeddings with Brain-Based Semantic Features

Author:

Chersoni Emmanuele1,Santus Enrico2,Huang Chu-Ren3,Lenci Alessandro4

Affiliation:

1. The Hong Kong Polytechnic University, Department of Chinese and Bilingual Studies. emmanuele.chersoni@polyu.edu.hk

2. MIT Computer Science and Artificial Intelligence Laboratory. esantus@mit.edu

3. The Hong Kong Polytechnic University, Department of Chinese and Bilingual Studies. churen.huang@polyu.edu.hk

4. University of Pisa, Department of Philology, Literature and Linguistics. alessandro.lenci@unipi.it

Abstract

Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these representations have been criticized for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons for word embedding performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings, including factorized count vectors and predict models (Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT). In our analysis, we first evaluate the quality of the mapping in a retrieval task, then we shed light on the semantic features that are better encoded in each embedding type. A large number of probing tasks is finally set to assess how the original and the mapped embeddings perform in discriminating semantic categories. For each probing task, we identify the most relevant semantic features and we show that there is a correlation between the embedding performance and how they encode those features. This study sets itself as a step forward in understanding which aspects of meaning are captured by vector spaces, by proposing a new and simple method to carve human-interpretable semantic representations from distributional vectors.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Reference129 articles.

1. Experiential, distributional and dependency-based word embeddings have complementary roles in decoding brain activity;Abnar,2018

2. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks;Adi,2017

3. Predicting neural activity patterns associated with sentences using a neurobiologically motivated model of semantic representation;Anderson;Cerebral Cortex,2016

4. Multiple regions of a cortical network commonly encode the meaning of words in multiple grammatical positions of read sentences;Anderson;Cerebral Cortex,2018

5. Neural activation semantic models: Computational lexical semantic models of localized neural activations;Athanasiou,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3