DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding-Reference-Cited by-同舟云学术

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Published:2018-04-27 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Shen Tao,Zhou Tianyi,Long Guodong,Jiang Jing,Pan Shirui,Zhang Chengqi

Abstract

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, "Directional Self-Attention Network (DiSAN)," is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 138 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Soft Contrastive Sequential Recommendation;ACM Transactions on Information Systems;2024-08-19

2. Large language models (LLMs): survey, technical frameworks, and future challenges;Artificial Intelligence Review;2024-08-18

3. A method for extracting buildings from remote sensing images based on 3DJA-UNet3+;Scientific Reports;2024-08-17

4. FCCS-Net: Breast cancer classification using Multi-Level fully Convolutional-Channel and spatial attention-based transfer learning approach;Biomedical Signal Processing and Control;2024-08

5. Dynamic spatial aware graph transformer for spatiotemporal traffic flow forecasting;Knowledge-Based Systems;2024-08