Author:
Yang Juan,Zhou Guanghong,Wang Ronggui,Xue Lixia
Abstract
AbstractExisting pre-trained models have yielded promising results in terms of computational time reduction. However, these models only focus on pruning simple sentences or less salient words, while neglecting the treatment of relatively complex sentences. It is frequently these sentences that cause the loss of model accuracy. This shows that the adaptation of the existing models is one-sided. To address this issue, in this paper, we propose a sample-adaptive training and inference model. Specifically, complex samples are extracted from the training datasets and a dedicated data augmentation module is trained to extract global and local semantic information of complex samples. During inference, simple samples can exit the model via the Sample Adaptive Exit Mechanism, Normal samples pass through the whole backbone model before inference, while complex samples are processed by the Characteristic Enhancement Module after passing through the backbone model. In this way, all samples are processed adaptively. Our extensive experiments on classification tasks datasets in the field of Natural Language Processing demonstrate that our method enhances model accuracy and reduces model inference time for multiple datasets. Moreover, our method is transferable and can be applied to multiple pre-trained models.
Funder
National Natural Science Foundation of China
National Key R&D Program of China
Publisher
Springer Science and Business Media LLC
Reference30 articles.
1. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: North American chapter of the association for computational linguistics
2. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692
3. Reimers N, Gurevych I (2019) Sentence-bert: sentence embeddings using siamese bert-networks. In: Conference on empirical methods in natural language processing
4. Zhang W, Hou L, Yin Y, Shang L, Chen X, Jiang X, Liu Q (2020) Ternarybert: distillation-aware ultra-low bit bert. In: Conference on empirical methods in natural language processing
5. Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv:1910.01108