Variable screening for Lasso based on multidimensional indexing-Reference-Cited by-同舟云学术

Variable screening for Lasso based on multidimensional indexing

Published:2023-08-27 Issue:1 Volume:38 Page:49-78
ISSN:1384-5810
Container-title:Data Mining and Knowledge Discovery
language:en
Short-container-title:Data Min Knowl Disc

Author:

Żogała-Siudem Barbara^ORCID,Jaroszewicz Szymon

Abstract

AbstractIn this paper we present a correlation based safe screening technique for building the complete Lasso path. Unlike many other Lasso screening approaches we do not consider prespecified values of the regularization parameter, but, instead, prune variables which cannot be the next best feature to be added to the model. Based on those results we present a modified homotopy algorithm for computing the regularization path. We demonstrate that, even though our algorithm provides the complete Lasso path, its performance is competitive with state of the art algorithms which, however, only provide solutions at a prespecified sample of regularization parameters. We also address problems of extremely high dimensionality, where the variables may not fit into main memory and are assumed to be stored on disk. A multidimensional index is used to quickly retrieve potentially relevant variables. We apply the approach to the important case when multiple models are built against a fixed set of variables, frequently encountered in statistical databases. We perform experiments using the complete Eurostat database as predictors and demonstrate that our approach allows for practical and efficient construction of Lasso models, which remain accurate and interpretable even when millions of highly correlated predictors are present.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Computer Science Applications,Information Systems

Link

https://link.springer.com/content/pdf/10.1007/s10618-023-00950-8.pdf

Reference51 articles.

1. Andoni A, Indyk P, Laarhoven T, Razenshteyn I, Schmidt L (2015) Practical and optimal lsh for angular distance. In: NIPS

2. Aumüller M, Bernhardsson E, Faithfull A (2020a) ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. Inf Syst 87:101374

3. Aumüller M, Bernhardsson E, Faithfull A (2020b) ANN-Benchmarks. http://ann-benchmarks.com. Accessed 12 Feb 2020

4. Babenko A, Lempitsky V (2014) The inverted multi-index. IEEE Trans Pattern Anal Mach Intell 37(6):1247–1260

5. Bach F, Jenatton R, Mairal J, Obozinski G (2012) Optimization with sparsity-inducing penalties. Found Trends Machine Learn 4(1):1–106