Enhancing trustworthy deep learning for image classification against evasion attacks: a systematic literature review-Reference-Cited by-同舟云学术

Enhancing trustworthy deep learning for image classification against evasion attacks: a systematic literature review

Published:2024-06-15 Issue:7 Volume:57 Page:
ISSN:1573-7462
Container-title:Artificial Intelligence Review
language:en
Short-container-title:Artif Intell Rev

Author:

Akhtom Dua’a Mkhiemir,Singh Manmeet Mahinderjit,XinYing Chew

Abstract

AbstractIn the rapidly evolving field of Deep Learning (DL), the trustworthiness of models is essential for their effective application in critical domains like healthcare and autonomous systems. Trustworthiness in DL encompasses aspects such as reliability, fairness, and transparency, which are crucial for its real-world impact and acceptance. However, the development of trustworthy DL models faces significant challenges. This is notably due to adversarial examples, a sophisticated form of evasion attack in adversarial machine learning (AML), which subtly alter inputs to deceive these models and pose a major threat to their safety and reliability. The current body of research primarily focuses on defensive measures, such as enhancing the robustness of models or implementing explainable AI techniques. However, this approach often neglects to address the fundamental vulnerabilities that adversaries exploit. As a result, the field tends to concentrate more on counteracting measures rather than gaining an in-depth understanding of the vulnerabilities and attack strategies inherent in DL systems. This gap in comprehensive understanding impedes the formulation of effective defense mechanisms. This research aims to shift the focus from predominantly defensive strategies toward a more extensive comprehension of adversarial techniques and the innate vulnerabilities of DL models. We undertake this by conducting a thorough systematic literature review, encompassing 49 diverse studies from the previous decade. Our findings reveal the key characteristics of adversarial examples that enable their success against image classification-based DL models. Building on these insights, we propose the Transferable Pretrained Adversarial Deep Learning framework (TPre-ADL). This conceptual model aims to rectify the deficiencies in current defense strategies by incorporating the analyzed traits of adversarial examples, potentially enhancing the robustness and trustworthiness of DL models.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10462-024-10777-4.pdf

Reference106 articles.

1. Agbo C, Mahmoud Q, Eklund J (2019) Blockchain technology in healthcare: a systematic review. Healthcare 7(2):56. https://doi.org/10.3390/healthcare7020056

2. Akhtar N, Mian A (2018) Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6:14410–14430. https://doi.org/10.1109/access.2018.2807385

3. Alzantot M, Sharma Y, Chakraborty S et al (2019) GenAttack. In: Proceedings of the genetic and evolutionary computation conference. ACM. https://doi.org/10.1145/3321707.3321749

4. Angelov P, Soares E (2020) Towards explainable deep neural networks (XDNN). Neural Netw 130:185–194

5. Apley DW, Zhu J (2020) Visualizing the effects of predictor variables in black box supervised learning models. J R Stat Soc Ser B Stat Methodol 82(4):1059–1086