Embedding and extraction of knowledge in tree ensemble classifiers-Reference-Cited by-同舟云学术

Embedding and extraction of knowledge in tree ensemble classifiers

Published:2021-11-24 Issue: Volume: Page:
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Huang Wei^ORCID,Zhao Xingyu,Huang Xiaowei

Abstract

AbstractThe embedding and extraction of knowledge is a recent trend in machine learning applications, e.g., to supplement training datasets that are small. Whilst, as the increasing use of machine learning models in security-critical applications, the embedding and extraction of malicious knowledge are equivalent to the notorious backdoor attack and defence, respectively. This paper studies the embedding and extraction of knowledge in tree ensemble classifiers, and focuses on knowledge expressible with a generic form of Boolean formulas, e.g., point-wise robustness and backdoor attacks. For the embedding, it is required to be preservative (the original performance of the classifier is preserved), verifiable (the knowledge can be attested), and stealthy (the embedding cannot be easily detected). To facilitate this, we propose two novel, and effective embedding algorithms, one of which is for black-box settings and the other for white-box settings. The embedding can be done in PTIME. Beyond the embedding, we develop an algorithm to extract the embedded knowledge, by reducing the problem to be solvable with an SMT (satisfiability modulo theories) solver. While this novel algorithm can successfully extract knowledge, the reduction leads to an NP computation. Therefore, if applying embedding as backdoor attacks and extraction as defence, our results suggest a complexity gap (P vs. NP) between the attack and defence when working with tree ensemble classifiers. We apply our algorithms to a diverse set of datasets to validate our conclusion extensively.

Funder

horizon 2020

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s10994-021-06068-6.pdf

Reference35 articles.

1. Asuncion, A., & Newman, D. (2007). Uci machine learning repository.

2. Bachl, M., Hartl, A., Fabini, J., & Zseby, T. (2019). Walling up backdoors in intrusion detection systems. In Proceedings of the 3rd ACM CoNEXT workshop on big data, machine learning and artificial intelligence for data communication networks (pp. 8–13).

3. Calzavara, S., Lucchese, C., & Tolomei, G. (2019). Adversarial training of gradient-boosted decision trees. In Proceedings of the 28th ACM international conference on information and knowledge management, (pp. 2429–2432).

4. Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 1–27.

5. Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., Molloy, I., & Srivastava, B. (2019). Detecting backdoor attacks on deep neural networks by activation clustering. In Workshop on artificial intelligence safety 2019 co-located with the thirty-third AAAI conference on artificial intelligence, vol. 2301.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MAIDS: malicious agent identification-based data security model for cloud environments;Cluster Computing;2024-02-23

2. A Combined Usage of NLP Libraries Towards Analyzing Software Documents;International Journal of Software Engineering and Knowledge Engineering;2023-07-29

3. A Comprehensive Survey on Ensemble Learning-Based Intrusion Detection Approaches in Computer Networks;IEEE Access;2023

4. Decision Tree;Artificial Intelligence: Foundations, Theory, and Algorithms;2012-02-24

5. Safety and Security Properties;Artificial Intelligence: Foundations, Theory, and Algorithms;2012-02-24