Improving sentence representation for vietnamese natural language understanding using optimal transport
-
Published:2023-12-02
Issue:6
Volume:45
Page:9277-9288
-
ISSN:1064-1246
-
Container-title:Journal of Intelligent & Fuzzy Systems
-
language:
-
Short-container-title:IFS
Author:
Nguyen Phu Xuan-Vinh1, Nguyen Thu Hoang-Thien2, Van Nguyen Kiet1, Nguyen Ngan Luu-Thuy1
Affiliation:
1. University of Information Technology, Vietnam National University, Ho Chi Minh City, Vietnam 2. International University, Vietnam National University, Ho Chi Minh City, Vietnam
Abstract
Multilingual pre-trained language models have achieved impressive results on most natural language processing tasks. However, the performance is inhibited due to capacity limitations and their under-representation of pre-training data, especially for languages with limited resources. This has led to the creation of tailored pre-trained language models, in which the models are pre-trained on large amounts of monolingual data or domain specific corpus. Nevertheless, compared to relying on multiple monolingual models, utilizing multilingual models offers the advantage of multilinguality, such as generalization on cross-lingual resources. To combine the advantages of both multilingual and monolingual models, we propose KDDA - a framework that leverages monolingual models to a single multilingual model with the aim to improve sentence representation for Vietnamese. KDDA employs teacher-student framework and cross-lingual transfer that aims to adopt knowledge from two monolingual models (teachers) and transfers them into a unified multilingual model (student). Since the representations from the teachers and the student lie on disparate semantic spaces, we measure discrepancy between their distributions by using Sinkhorn Divergence - an optimal transport distance. We conduct experiments on two Vietnamese natural language understanding tasks, including machine reading comprehension and natural language inference. Experimental results show that our model outperforms other state-of-the-art models and yields competitive performances.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference53 articles.
1. Devlin J. , Chang M.-W. , Lee K. and Toutanova K. , BERT: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), (Minneapolis, Minnesota), pp. 4171–4186, Association for Computational Linguistics, June 2019. 2. Lan Z. , Chen M. , Goodman S. , Gimpel K. , Sharma P. and SoricutAlbert: R. , A lite bert for self-supervised learning of language representations, The International Conference on Learning Representations (ICLR), 2020. 3. Liu Y. , Ott M. , Goyal N. , Du J. , Joshi M. , Chen D. , Levy O. Lewis M. , Zettlemoyer L. and Stoyanov V. , Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692, 2019. 4. Pires T. , Schlinger E. and Garrette D. , How multilingual ismultilingual BERT? in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (Florence, Italy), pp. 5931–5937, Association for Computational Linguistics, July 2019. 5. Wu S. and Dredze M. , Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing andthe 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (Hong Kong, China), pp. 833–844, Association for Computational Linguistics, Nov 2019.
|
|