Abstract
AbstractInteractions between predictors play an important role in many applications. Popular and successful tree-based supervised learning methods such as random forests or logic regression can incorporate interactions associated with the considered outcome without specifying which variables might interact. Nonetheless, these algorithms suffer from certain drawbacks such as limited interpretability of model predictions and difficulties with negligible marginal effects in the case of random forests or not being able to incorporate interactions with continuous variables, being restricted to additive structures between Boolean terms, and not directly considering conjunctions that reveal the interactions in the case of logic regression. We, therefore, propose a novel method called logic decision trees (logicDT) that is specifically tailored to binary input data and helps to overcome the drawbacks of existing methods. The main idea consists of considering sets of Boolean conjunctions, using these terms as input variables for decision trees, and searching for the best performing model. logicDT is also accompanied by a framework for estimating the importance of identified terms, i.e., input variables and interactions between input variables. This new method is compared to other popular statistical learning algorithms in simulations and real data applications. As these evaluations show, logicDT is able to yield high prediction performances while maintaining interpretability.
Funder
Deutsche Forschungsgemeinschaft
Heinrich-Heine-Universität Düsseldorf
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Software
Reference77 articles.
1. Aarts, E., & Van Laarhoven, P. (1985). Statistical cooling: A general approach to combinatorial optimization problems. Philips Journal of Research, 40(4), 193–226.
2. Aglin, G., Nijssen, S., & Schaus, P. (2020). Learning optimal decision trees using caching branch-and-bound search. In Proceedings of the AAAI conference on artificial intelligence, (Vol. 34, pp. 3146–3153).
3. Aglin, G., Nijssen, S., & Schaus, P. (2020b). PyDL8.5: A library for learning optimal decision trees. In Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20 (pp. 5222–5224). International Joint Conferences on Artificial Intelligence Organization.
4. Bellinger, C., Mohomed Jabbar, M. S., Zaïane, O., & Osornio-Vargas, A. (2017). A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health, 17, 907. https://doi.org/10.1186/s12889-017-4914-3
5. Bénard, C., Biau, G., da Veiga, S., & Scornet, E. (2021). Interpretable random forests via rule extraction. In Proceedings of the 24th international conference on artificial intelligence and statistics, Volume 130 of Proceedings of machine learning research (pp. 937–945). PMLR.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献