Affiliation:
1. Capitol Technology University, USA & University of Virginia, USA
Abstract
This study introduces “NeuroGuard,” an innovative defense mechanism designed to enhance the security of natural language processing (NLP) models against complex backdoor attacks. Diverging from traditional methodologies, NeuroGuard employs a sophisticated variant of the k-means clustering algorithm, meticulously crafted to detect and neutralize hidden backdoor triggers in data. This novel approach is universally adaptable, providing a robust safeguard across a wide range of NLP applications without sacrificing performance. Through rigorous experimentation and in-depth comparative analysis, NeuroGuard outperforms existing defense strategies, significantly reducing the effectiveness of backdoor attacks. This breakthrough in NLP model security represents a crucial step forward in protecting the integrity of language-based AI systems.
Reference65 articles.
1. Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
2. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder
3. Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., Molloy, I., & Srivastava, B. (2018). Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728.
4. Mitigating backdoor attacks in LSTM-based text classification systems by Backdoor Keyword Identification
5. DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks