Abstract
With the increasing popularity and usage of artificial intelligence systems, it has become crucial to address their vulnerability to cyber-attacks. In this study, we propose a novel gradient descent-based method to generate fake data that can be accepted as positive by a targeted machine learning model. Our method is designed to generate a large number of positive samples with a minimal number of probes to the model, making it difficult to detect by security systems. Additionally, we develop an alternative model to the attacked model using a reverse engineering approach, trained on a dataset composed of the samples generated by our method. We evaluate the success of our proposed method and the alternative model through a series of experiments. We conducted experiments on six distinct datasets, each of which was trained using three separate machine-learning algorithms. This resulted in a total of eighteen unique models that were evaluated and compared in our analysis. In the evaluation of results, the most commonly used metrics in the literature, including effective attack rate (EAR), accuracy, precision, recall, and F1 score, were employed. Focusing particularly on EAR-oriented assessments, our method demonstrates its effectiveness with a notably high EAR of 97% in the combination of the kNN method and the Cancer dataset. According to the results of our experiments, the proposed method demonstrates high effectiveness as a data-driven attack method.