Domain-Specific Word Embeddings with Structure Prediction-Reference-Cited by-同舟云学术

Domain-Specific Word Embeddings with Structure Prediction

Published:2023-03-27 Issue: Volume:11 Page:320-335
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:

Author:

Lassner David¹²,Brandl Stephanie³²⁴,Baillot Anne⁵,Nakajima Shinichi³²⁶

Affiliation:

1. TU Berlin, Germany. lassner@tu-berlin.de

2. BIFOLD, Germany

3. TU Berlin, Germany

4. University of Copenhagen, Denmark. brandl@di.ku.dk

5. Le Mans Université, France

6. RIKEN Center for AIP, Japan

Abstract

Abstract Complementary to finding good general word embeddings, an important question for representation learning is to find dynamic word embeddings, for example, across time or domain. Current methods do not offer a way to use or predict information on structure between sub-corpora, time or domain and dynamic embeddings can only be compared after post-alignment. We propose novel word embedding methods that provide general word representations for the whole corpus, domain- specific representations for each sub-corpus, sub-corpus structure, and embedding alignment simultaneously. We present an empirical evaluation on New York Times articles and two English Wikipedia datasets with articles on science and philosophy. Our method, called Word2Vec with Structure Prediction (W2VPred), provides better performance than baselines in terms of the general analogy tests, domain-specific analogy tests, and multiple specific word embedding evaluations as well as structure prediction performance when no structure is given a priori. As a use case in the field of Digital Humanities we demonstrate how to raise novel research questions for high literature from the German Text Archive.

Publisher

MIT Press

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00538/2075946/tacl_a_00538.pdf

Reference36 articles.

1. Words are malleable: Computing semantic shifts in political and media discourse;Azarbonyad,2017

2. Die Krux mit dem Netz Verknüpfung und Visualisierung bei digitalen Briefeditionen;Baillot,2018

3. Dynamic word embeddings;Bamler;arXiv preprint arXiv:1702.08359,2017

4. The effect of terrorist events on media portrayals of Islam and Muslims: Evidence from New York Times headlines, 1985–2013;Bleich;Ethnic and Racial Studies,2016

5. ‘Delta’: A measure of stylistic difference and a guide to likely authorship;Burrows;Literary and Linguistic Computing,2002

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Impact of Preprocessing Techniques Towards Word Embedding;Advances in Visual Informatics;2023-10-20