Interpretable Adversarial Perturbation in Input Embedding Space for Text-Reference-Cited by-同舟云学术

Interpretable Adversarial Perturbation in Input Embedding Space for Text

Published:2018-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Sato Motoki¹,Suzuki Jun²³,Shindo Hiroyuki⁴³,Matsumoto Yuji⁴³

Affiliation:

1. Preferred Networks, Inc.

2. NTT Communication Science Laboratories

3. RIKEN AIP

4. Nara Institute of Science and Technology

Abstract

Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 45 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Noisy Communication Modeling for Improved Cooperation in Codenames;2024 IEEE Conference on Games (CoG);2024-08-05

2. Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges;International Journal of Multimedia Information Retrieval;2024-06-25

3. Unsupervised Traditional Chinese Herb Mention Normalization via Robustness-Promotion Oriented Self-supervised Training;Lecture Notes in Computer Science;2024

4. Multilingual mixture attention interaction framework with adversarial training for cross-lingual SLU;Neural Computing and Applications;2023-11-18

5. Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute Privacy;ACM Transactions on Knowledge Discovery from Data;2023-11-13