Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples-Reference-Cited by-同舟云学术

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Published:2020-04-03 Issue:04 Volume:34 Page:3601-3608
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Cheng Minhao,Yi Jinfeng,Chen Pin-Yu,Zhang Huan,Hsieh Cho-Jui

Abstract

Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: by changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. We also use an external sentiment classifier to verify the property of preserving semantic meanings for our generated adversarial examples. On the other hand, we recognize that, compared with the well-evaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 52 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LLMEffiChecker : Understanding and Testing Efficiency Degradation of Large Language Models;ACM Transactions on Software Engineering and Methodology;2024-08-26

2. Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges;International Journal of Multimedia Information Retrieval;2024-06-25

3. A survey of safety and trustworthiness of large language models through the lens of verification and validation;Artificial Intelligence Review;2024-06-17

4. Transferable Multimodal Attack on Vision-Language Pre-training Models;2024 IEEE Symposium on Security and Privacy (SP);2024-05-19

5. Reversible jump attack to textual classifiers with modification reduction;Machine Learning;2024-04-22