Abstract
Abstract
Due to training a deep learning model requires a lot of resources, we regard the trained model as the intellectual property (IP) of its trainer. However, researchers have shown that deep learning models are susceptible to model-stealing attacks. Recently, a method to detect theft by embedding external features has been proposed. We argue that this method is similar to backdooring the model and still has deficiencies in protecting model IP. In this paper, we present a method that removes external features to evade such detection. Initially, we get clean data by employing model inversion and adaptive selection. Subsequently, we utilize this data to exclude neurons associated with external features, while simultaneously guaranteeing that the model’s accuracy remains relatively unchanged. The empirical data demonstrates that our method can help different types of adversaries evade ownership verification of external features. In addition, our method can further enhance the attack effect under watermark-based defense.