A cross-lingual sentence pair interaction feature capture model based on pseudo-corpus and multilingual embedding-Reference-Cited by-同舟云学术

A cross-lingual sentence pair interaction feature capture model based on pseudo-corpus and multilingual embedding

Published:2022-05-10 Issue:1 Volume:35 Page:1-14
ISSN:1875-8452
Container-title:AI Communications
language:
Short-container-title:AIC

Author:

Liu Gang¹²,Dong Yichao¹,Wang Kai¹,Yan Zhizheng¹

Affiliation:

1. College of Computer Science and Technology, Harbin Engineering University, China

2. State Key Laboratory for Novel Software Technology, Nanjing University, China

Abstract

Recently, the emergence of the digital language division and the availability of cross-lingual benchmarks make researches of cross-lingual texts more popular. However, the performance of existing methods based on mapping relation are not good enough, because sometimes the structures of language spaces are not isomorphic. Besides, polysemy makes the extraction of interaction features hard. For cross-lingual word embedding, a model named Cross-lingual Word Embedding Space Based on Pseudo Corpus (CWE-PC) is proposed to obtain cross-lingual and multilingual word embedding. For cross-lingual sentence pair interaction feature capture, a Cross-language Feature Capture Based on Similarity Matrix (CFC-SM) model is built to extract cross-lingual interaction features. ELMo pretrained model and multiple layer convolution are used to alleviate polysemy and extract interaction features. These models are evaluated on multiple language pairs and results show that they outperform the state-of-the-art cross-lingual word embedding methods.

Publisher

IOS Press

Subject

Artificial Intelligence

Reference41 articles.

1. M. Artetxe, G. Labaka and E. Agirre, Learning bilingual word embeddings with (almost) no bilingual data, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017, pp. 1017–1042.

2. M. Artetxe, G. Labaka and E. Agirre, A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, pp. 789–798.

3. ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

4. Hierarchical mapping for crosslingual word embedding alignment;Azpiazu;Transactions of the Association for Computational Linguistics,2020

5. Linear transformations for cross-lingual semantic textual similarity;Brychcin;Knowledge-Based Systems,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimization Strategies and Efficacy Evaluation of Cross-Language Embedding Model in Teaching English in Colleges and Universities;Applied Mathematics and Nonlinear Sciences;2024-01-01

2. Entity alignment with fusing relation representation;AI Communications;2023-12-12

3. Bioinspired Flexible and Programmable Negative Stiffness Mechanical Metamaterials;Advanced Intelligent Systems;2023-02-22