Author:
Wei Shuyong,Yu Defa,Lv Chenguo
Abstract
Abstract
BERT is a pre-trained language model. Although the model is proven to be highly performant in a variety of natural language understanding tasks, its large size makes it hard to implement in practical situation where computing resource is limited. In order to improve the model efficiency of BERT for sentiment analysis task, we propose a novel distilled version of BERT. It distills knowledge from the full-size BERT model, which serves as the teacher model. The distilled model efficiently learns the last hidden state and soft label of the teacher model, which are different from previous models. We use distillation learning objective that is able to effectively transfer knowledge from the original big model to the compact model. Our model reduces BERT model size by ∼40%, but retains ∼98.2% of performance in sentiment classification task. Our model achieves promising results in SST-2 sentiment analysis, and outperforms previous distilled model.
Subject
General Physics and Astronomy
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献