Abstract
AbstractRule learning methods have a long history of active research in the machine learning community. They are not only a common choice in applications that demand human-interpretable classification models but have also been shown to achieve state-of-the-art performance when used in ensemble methods. Unfortunately, only little information can be found in the literature about the various implementation details that are crucial for the efficient induction of rule-based models. This work provides a detailed discussion of algorithmic concepts and approximations that enable applying rule learning techniques to large amounts of data. To demonstrate the advantages and limitations of these individual concepts in a series of experiments, we rely on BOOMER—a flexible and publicly available implementation for the efficient induction of gradient boosted single- or multi-label classification rules.
Funder
Deutsche Forschungsgemeinschaft
Ludwig-Maximilians-Universität München
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Statistics and Probability
Reference49 articles.
1. Alsabti K, Ranka S, Singh V (1998) CLOUDS: a decision tree classifier for large datasets. In: Proceeding international conference on knowledge discovery and data mining, p 2–8
2. Anderson E, Bai Z, Bischof C, et al (1999) LAPACK Users’ guide. SIAM
3. Bénard C, Biau G, Da Veiga S et al (2021) SIRUS: Stable and interpretable RUle set for classification. Electronic J Stat 15(1):427–505
4. Blackford LS, Petitet A, Pozo R et al (2002) An updated set of basic linear algebra subprograms (BLAS). ACM Transact Math Softw 28(2):135–151
5. Boley M, Teshuva S, Bodic PL, et al (2021) Better short than greedy: interpretable models through optimal rule boosting. In: Proc. SIAM international conference on data mining, pp 351–359