Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding-Reference-Cited by-同舟云学术

Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding

Published:2020-04-03 Issue:04 Volume:34 Page:3235-3242
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Barkan Oren,Razin Noam,Malkiel Itzik,Katz Ori,Caciularu Avi,Koenigstein Noam

Abstract

Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations – a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers. This exhaustive process becomes computationally prohibitive when the number of candidate sentences is large. In contrast, sentence embedding techniques learn a sentence-to-vector mapping and compute the similarity between the sentence vectors via simple elementary operations. In this paper, we introduce Distilled Sentence Embedding (DSE) – a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks. The outline of DSE is as follows: Given a cross-attentive teacher model (e.g. a fine-tuned BERT), we train a sentence embedding based student model to reconstruct the sentence-pair scores obtained by the teacher model. We empirically demonstrate the effectiveness of DSE on five GLUE sentence-pair tasks. DSE significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities by several orders of magnitude, with an average relative degradation of 4.6% compared to BERT. Furthermore, we show that DSE produces sentence embeddings that reach state-of-the-art performance on universal sentence representation benchmarks. Our code is made publicly available at https://github.com/microsoft/Distilled-Sentence-Embedding.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Counterfactual Framework for Learning and Evaluating Explanations for Recommender Systems;Proceedings of the ACM Web Conference 2024;2024-05-13

2. Electrical Fault Diagnosis From Text Data: A Supervised Sentence Embedding Combined With Imbalanced Classification;IEEE Transactions on Industrial Electronics;2024-03

3. Stochastic Integrated Explanations for Vision Models;2023 IEEE International Conference on Data Mining (ICDM);2023-12-01

4. Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01

5. Modeling users’ heterogeneous taste with diversified attentive user profiles;User Modeling and User-Adapted Interaction;2023-08-01