Textual Adversarial Training of Machine Learning Model for Resistance to Adversarial Examples

Author:

Kwon Hyun1ORCID,Lee Sanghyun2ORCID

Affiliation:

1. Department of Artificial Intelligence and Data Science, Korea Military Academy, 574 Hwarang-ro, Nowon-gu, Seoul 01819, Republic of Korea

2. Graduate School of Information Security, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea

Abstract

Deep neural networks provide good performance for image recognition, speech recognition, text recognition, and pattern recognition. However, such networks are vulnerable to attack by adversarial examples. Adversarial examples are created by adding a small amount of noise to an original sample in such a way that no problem is perceptible to humans, yet the sample will be incorrectly recognized by a model. Adversarial examples have been studied mainly in the context of images, but research has expanded to include the text domain. In the textual context, an adversarial example is a sample of text in which certain important words have been changed so that the sample will be misclassified by a model even though to humans it is the same as the original text in terms of meaning and grammar. In the text domain, there have been relatively few studies on defenses against adversarial examples compared with the number of studies on adversarial example attacks. In this paper, we propose an adversarial training method to defend against adversarial examples that target the latest text model, bidirectional encoder representations from transformers (BERT). In the proposed method, adversarial examples are generated using various parameters and then are applied in additional training of the target model to instill robustness against unknown adversarial examples. Experiments were conducted using five datasets (AG’s News, a movie review dataset, the IMDB Large Movie Review Dataset (IMDB), the Stanford Natural Language Inference (SNLI) corpus, and the Multi-Genre Natural Language Inference (MultiNLI) corpus), with TensorFlow as the machine learning library. According to the experimental results, the baseline model had an accuracy of 88.1% on the original sentences and an accuracy of 9.2% on the adversarial sentences, whereas the model that underwent the proposed training method maintained an average accuracy of 87.2% on the original sentences and had an average accuracy of 22.5% on the adversarial sentences.

Funder

Ministry of Education

Publisher

Hindawi Limited

Subject

Computer Networks and Communications,Information Systems

Reference39 articles.

1. Deep learning in neural networks: An overview

2. Very deep convolutional networks for large-scale image recognition;K. Simonyan;International Conference on Learning Representations,2015

3. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

4. A unified architecture for natural language processing

5. Intriguing properties of neural networks;C. Szegedy;International Conference on Learning Representations,2014

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3