A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages-Reference-Cited by-同舟云学术

A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages

Published:2018-08-06 Issue:1 Volume:25 Page:43-67
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

ZENNAKI O.,SEMMAR N.,BESACIER L.

Abstract

AbstractThis work focuses on the rapid development of linguistic annotation tools for low-resource languages (languages that have no labeled training data). We experiment with several cross-lingual annotation projection methods using recurrent neural networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between source and target languages. More precisely, our approach has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger forNlanguages). We investigate both uni and bidirectional RNN models and propose a method to include external information (for instance, low-level information from part-of-speech tags) in the RNN to train higher level taggers (for instance, Super Sense taggers). We demonstrate the validity and genericity of our model by using parallel corpora (obtained by manual or automatic translation). Our experiments are conducted to induce cross-lingual part-of-speech and Super Sense taggers. We also use our approach in a weakly supervised context, and it shows an excellent potential for very low-resource settings (less than 1k training utterances).

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference65 articles.

1. Supervised Sequence Labelling

2. Yarowsky D. , Ngai G. , and Wicentowski R. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research, pp. 1–8.

3. Annotation automatique de corpus: panorama et état de la technique;Veronis;Ingénierie des langues,2000

4. Sutskever I. , Vinyals O. , and Le Q. V. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 3104–3112.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Application of an Improved Convolutional Neural Network Algorithm in Text Classification;Journal of Web Engineering;2024-05-25

2. From Pre-Training to Meta-Learning: A Journey in Low-Resource-Language Representation Learning;IEEE Access;2023

3. Beyond the Benchmarks: Toward Human-Like Lexical Representations;Frontiers in Artificial Intelligence;2022-05-24

4. Domain Adaptation for POS Tagging with Contrastive Monotonic Chunk-wise Attention;Neural Processing Letters;2022-02-02

5. Meemi: A simple method for post-processing and integrating cross-lingual word embeddings;Natural Language Engineering;2021-10-13