PLANET-Reference-Cited by-同舟云学术

PLANET

Published:2009-08 Issue:2 Volume:2 Page:1426-1437
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Panda Biswanath¹,Herbach Joshua S.¹,Basu Sugato¹,Bayardo Roberto J.¹

Affiliation:

1. Google, Inc.

Abstract

Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Google's computing infrastructure is based on commodity hardware. In this paper, we describe PLANET: a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations, and implements each one using the MapReduce model of distributed computation. We show how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models. We discuss the benefits and challenges of using a MapReduce compute cluster for tree learning, and demonstrate the scalability of this approach by applying it to a real world learning task from the domain of computational advertising.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/1687553.1687569

Cited by 150 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Technical note: Monitoring discharge of mountain streams by retrieving image features with deep learning;Hydrology and Earth System Sciences;2024-09-10

2. Parallel approaches for a decision tree-based explainability algorithm;Future Generation Computer Systems;2024-09

3. Machine Learning Assisted State-of-the-Art-of Petrographic Classification From Geophysical Logs;Pure and Applied Geophysics;2024-08-31

4. Integrating hybrid weak learners for lithofacies classification using well log data;2024 8th International Conference on Image and Signal Processing and their Applications (ISPA);2024-04-21

5. Fast Search-by-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests;Proceedings of the VLDB Endowment;2023-07