Improving sentence representation for vietnamese natural language understanding using optimal transport

Author:

Nguyen Phu Xuan-Vinh1,Nguyen Thu Hoang-Thien2,Van Nguyen Kiet1,Nguyen Ngan Luu-Thuy1

Affiliation:

1. University of Information Technology, Vietnam National University, Ho Chi Minh City, Vietnam

2. International University, Vietnam National University, Ho Chi Minh City, Vietnam

Abstract

Multilingual pre-trained language models have achieved impressive results on most natural language processing tasks. However, the performance is inhibited due to capacity limitations and their under-representation of pre-training data, especially for languages with limited resources. This has led to the creation of tailored pre-trained language models, in which the models are pre-trained on large amounts of monolingual data or domain specific corpus. Nevertheless, compared to relying on multiple monolingual models, utilizing multilingual models offers the advantage of multilinguality, such as generalization on cross-lingual resources. To combine the advantages of both multilingual and monolingual models, we propose KDDA - a framework that leverages monolingual models to a single multilingual model with the aim to improve sentence representation for Vietnamese. KDDA employs teacher-student framework and cross-lingual transfer that aims to adopt knowledge from two monolingual models (teachers) and transfers them into a unified multilingual model (student). Since the representations from the teachers and the student lie on disparate semantic spaces, we measure discrepancy between their distributions by using Sinkhorn Divergence - an optimal transport distance. We conduct experiments on two Vietnamese natural language understanding tasks, including machine reading comprehension and natural language inference. Experimental results show that our model outperforms other state-of-the-art models and yields competitive performances.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference53 articles.

1. Devlin J. , Chang M.-W. , Lee K. and Toutanova K. , BERT: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), (Minneapolis, Minnesota), pp. 4171–4186, Association for Computational Linguistics, June 2019.

2. Lan Z. , Chen M. , Goodman S. , Gimpel K. , Sharma P. and SoricutAlbert: R. , A lite bert for self-supervised learning of language representations, The International Conference on Learning Representations (ICLR), 2020.

3. Liu Y. , Ott M. , Goyal N. , Du J. , Joshi M. , Chen D. , Levy O. Lewis M. , Zettlemoyer L. and Stoyanov V. , Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692, 2019.

4. Pires T. , Schlinger E. and Garrette D. , How multilingual ismultilingual BERT? in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (Florence, Italy), pp. 5931–5937, Association for Computational Linguistics, July 2019.

5. Wu S. and Dredze M. , Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing andthe 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (Hong Kong, China), pp. 833–844, Association for Computational Linguistics, Nov 2019.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3