Author:
Liu Jingshu,Morin Emmanuel,Peña Saldarriaga Sebastian,Lark Joseph
Abstract
AbstractSignificant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training with comparable corpora and existing key phrase extraction, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five data sets show that our method obtains state-of-the-art results on the bilingual phrase alignment task and improves the results of different length phrase alignment by a mean of8.8points in MAP.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software
Reference82 articles.
1. Learning task-dependent distributed representations by backpropagation through structure
2. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
3. Peng, X. , Lin, C. and Stevenson, M. (2021). Cross-lingual word embedding refinement by $\ell_{1}$ norm optimisation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online. Association for Computational Linguistics, pp. 2690–2701.
4. Agerri, R. , Bermudez, J. and Rigau, G. (2014). Ixa pipeline: Efficient and ready to use multilingual nlp tools. In Chair N.C.C., Choukri K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. and Piperidis, S. (eds), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland. European Language Resources Association (ELRA).
5. Single or Multiple? Combining Word Representations Independently Learned from Text and WordNet
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Computational Terminology;New Frontiers in Translation Studies;2024