SKDBERT: Compressing BERT via Stochastic Knowledge Distillation-Reference-Cited by-同舟云学术

SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

Published:2023-06-26 Issue:6 Volume:37 Page:7414-7422
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Ding Zixiang,Jiang Guoqing,Zhang Shuai,Guo Lin,Lin Wei

Abstract

In this paper, we propose Stochastic Knowledge Distillation (SKD) to obtain compact BERT-style language model dubbed SKDBERT. In each distillation iteration, SKD samples a teacher model from a pre-defined teacher team, which consists of multiple teacher models with multi-level capacities, to transfer knowledge into student model in an one-to-one manner. Sampling distribution plays an important role in SKD. We heuristically present three types of sampling distributions to assign appropriate probabilities for multi-level teacher models. SKD has two advantages: 1) it can preserve the diversities of multi-level teacher models via stochastically sampling single teacher model in each distillation iteration, and 2) it can also improve the efficacy of knowledge distillation via multi-level teacher models when large capacity gap exists between the teacher model and the student model. Experimental results on GLUE benchmark show that SKDBERT reduces the size of a BERT model by 40% while retaining 99.5% performances of language understanding and being 100% faster.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sequence-Wise Distillation Method for Efficient Chinese Heritage Language Error Correction;2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT);2024-03-29

2. Functional Analysis of English Carriers and Related Resources of Cultural Communication in Internet Media;Economics;2024-01-01

3. RoBERTa-CoA: RoBERTa-Based Effective Finetuning Method Using Co-Attention;IEEE Access;2023