Training Method and Device of Chemical Industry Chinese Language Model Based on Knowledge Distillation-Reference-Cited by-同舟云学术

Training Method and Device of Chemical Industry Chinese Language Model Based on Knowledge Distillation

Published:2021-12-13 Issue: Volume:2021 Page:1-9
ISSN:1875-919X
Container-title:Scientific Programming
language:en
Short-container-title:Scientific Programming

Author:

Li Wen-Ting¹,Gao Shang-Bing¹^ORCID,Zhang Jun-Qiang¹^ORCID,Guo Shu-Xing¹

Affiliation:

1. Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian 223003, China

Abstract

Recent advances in pretraining language models have obtained state-of-the-art results in various natural language processing tasks. However, these huge pretraining language models are difficult to be used in practical applications, such as mobile devices and embedded devices. Moreover, there is no pretraining language model for the chemical industry. In this work, we propose a method to pretrain a smaller language representation model of the chemical industry domain. First, a huge number of chemical industry texts are used as pretraining corpus, and nontraditional knowledge distillation technology is used to build a simplified model to learn the knowledge in the BERT model. By learning the embedded layer, the middle layer, and the prediction layer at different stages, the simplified model not only learns the probability distribution of the prediction layer but also learns the embedded layer and the middle layer at the same time, to acquire the learning ability of BERT model. Finally, it is applied to the downstream tasks. Experiments show that, compared with the current BERT model distillation method, our method makes full use of the rich feature knowledge in the middle layer of the teacher model while building a student model based on the BiLSTM architecture, which effectively solves the problem that the traditional student model based on the transformer architecture is too large and improves the accuracy of the language model in the chemical domain.

Funder

National Basic Research Program of China

Publisher

Hindawi Limited

Subject

Computer Science Applications,Software

Link

http://downloads.hindawi.com/journals/sp/2021/5753693.pdf

Reference37 articles.

1. Bert: pre-training of deep bidirectional transformers for language understanding;J. Devlin,2018

2. Xlnet: generalized autoregressive pretraining for language understanding;Z. Yang

3. Roberta: a robustly optimized bert pretraining approach;Y. Liu,2019