Concentration or distraction? A synergetic-based attention weights optimization method-Reference-Cited by-同舟云学术

Concentration or distraction? A synergetic-based attention weights optimization method

Published:2023-06-30 Issue:6 Volume:9 Page:7381-7393
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Wang Zihao^ORCID,Li Haifeng,Ma Lin,Jiang Feng

Abstract

AbstractThe attention mechanism empowers deep learning to a broader range of applications, but the contribution of the attention module is highly controversial. Research on modern Hopfield networks indicates that the attention mechanism can also be used in shallow networks. Its automatic sample filtering facilitates instance extraction in Multiple Instances Learning tasks. Since the attention mechanism has a clear contribution and intuitive performance in shallow networks, this paper further investigates its optimization method based on the recurrent neural network. Through comprehensive comparison, we find that the Synergetic Neural Network has the advantage of more accurate and controllable convergences and revertible converging steps. Therefore, we design the Syn layer based on the Synergetic Neural Network and propose the novel invertible activation function as the forward and backward update formula for attention weights concentration or distraction. Experimental results show that our method outperforms other methods in all Multiple Instances Learning benchmark datasets. Concentration improves the robustness of the results, while distraction expands the instance observing space and yields better results. Codes available at https://github.com/wzh134/Syn.

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-023-01133-0.pdf

Reference59 articles.

1. Weston J, Chopra S, Bordes A (2015) Memory networks. In: 3rd Int conf learn represent ICLR 2015—conf track proc. https://doi.org/10.1007/978-3-030-82184-5_11

2. Sukhbaatar S, Szlam A, Weston J, Fergus R (2015) End-to-end memory networks. In: Advances in neural information processing systems

3. Daniluk M, Rocktäschel T, Welbl J, Riedel S (2017) Frustratingly short attention spans in neural language modeling. CoRR abs/1702.0

4. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems. pp 5999–6009

5. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) improving language understanding by generative pre-training. Homol Homotopy Appl