Abstract
AbstractThe success of machine learning in real-world use cases has increased its demand in mission-critical applications such as autonomous vehicles, healthcare and medical diagnosis, aviation and flight safety, natural disaster prediction, early warning systems, etc. Adaptive Boosting (AdaBoost) is an ensemble learning method that has gained much traction in such applications. Inherently being a non-interpretable model, the interpretability of the AdaBoost algorithm has been a research topic for many years. Furthermore, most of the research being conducted till now is aimed at explaining AdaBoost using perturbation-based techniques. The paper presents a technique to interpret the AdaBoost algorithm from a data perspective using deletion diagnostics and Cook’s distance. The technique achieves interpretability by detecting the most influential data instances and their impact on the feature importance of the model. This interpretability enables domain experts to accurately modify the significance of specific features in a trained AdaBoost model depending on the data instances. Unlike explaining AdaBoost using perturbation-based techniques, interpreting from a data perspective will enable it to debug data-related biases, errors and to impart the knowledge of the domain experts into the model through domain aware fine-tuning. Experimental studies were conducted with diverse real-world multi-feature datasets to demonstrate interpretability and knowledge integration through domain-aware fine-tuning.
Publisher
Springer Science and Business Media LLC