Fitness evaluation reuse for accelerating GPU-based evolutionary induction of decision trees-Reference-Cited by-同舟云学术

Fitness evaluation reuse for accelerating GPU-based evolutionary induction of decision trees

Published:2020-09-15 Issue:1 Volume:35 Page:20-32
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Jurczuk Krzysztof¹^ORCID,Czajkowski Marcin¹,Kretowski Marek¹

Affiliation:

1. Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland

Abstract

Decision trees (DTs) are one of the most popular white-box machine-learning techniques. Traditionally, DTs are induced using a top-down greedy search that may lead to sub-optimal solutions. One of the emerging alternatives is an evolutionary induction inspired by the biological evolution. It searches for the tree structure and tests simultaneously, which results in less complex DTs with at least comparable prediction performance. However, the evolutionary search is computationally expensive, and its effective application to big data mining needs algorithmic and technological progress. In this paper, noting that many trees or their parts reappear during the evolution, we propose a reuse strategy. A fixed number of recently processed individuals (DTs) is stored in a so-called repository. A part of the repository entry (related to fitness calculations) is maintained on a CPU side to limit CPU/GPU memory transactions. The rest of the repository entry (tree structures) is located on a GPU side to speed up searching for similar DTs. As the most time-demanding task of the induction is the DTs’ evaluation, the GPU first searches similar DTs in the repository for reuse. If it fails, the GPU has to evaluate DT from the ground up. Large artificial and real-life datasets and various repository strategies are tested. Results show that the concept of reusing information from previous generations can accelerate the original GPU-based solution further. It is especially visible for large-scale data. To give an idea of the overall acceleration scale, the proposed solution can process even billions of objects in a few hours on a single GPU workstation.

Funder

Polish Ministry of Science and Higher Education

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342020957393

Reference40 articles.

1. An External Memory Implementation in Ant Colony Optimization

2. Chromosome Reuse in Genetic Algorithms

3. A memory-based colonization scheme for particle swarm optimization