Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis-Reference-Cited by-同舟云学术

Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

Published:2018-12-31 Issue:4 Volume:17 Page:1-16
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Bai Xuefeng¹^ORCID,Cao Hailong¹,Zhao Tiejun¹

Affiliation:

1. Harbin Institute of Technology, China

Abstract

Cross-lingual word embeddings are representations for vocabularies of two or more languages in one common continuous vector space and are widely used in various natural language processing tasks. A state-of-the-art way to generate cross-lingual word embeddings is to learn a linear mapping, with an assumption that the vector representations of similar words in different languages are related by a linear relationship. However, this assumption does not always hold true, especially for substantially different languages. We therefore propose to use kernel canonical correlation analysis to capture a non-linear relationship between word embeddings of two languages. By extensively evaluating the learned word embeddings on three tasks (word similarity, cross-lingual dictionary induction, and cross-lingual document classification) across five language pairs, we demonstrate that our proposed approach achieves essentially better performances than previous linear methods on all of the three tasks, especially for language pairs with substantial typological difference.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3197566

Reference38 articles.

1. Shotaro Akaho. 2006. A kernel method for canonical correlation analysis. CoRR abs/cs/0609071. Shotaro Akaho. 2006. A kernel method for canonical correlation analysis. CoRR abs/cs/0609071.

2. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Kernel Least Squares Transformations for Cross-Lingual Semantic Spaces;Lecture Notes in Computer Science;2024

2. Leveraging Vector Space Similarity for Learning Cross-Lingual Word Embeddings: A Systematic Review;Digital;2021-07-01

3. FSPRM: A Feature Subsequence Based Probability Representation Model for Chinese Word Embedding;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2021

4. Multi-view network embedding with node similarity ensemble;World Wide Web;2020-03-12