Abstract
AbstractThe design of regression models that are not affected by outliers is an important task which has been subject of numerous papers within the statistics community for the last decades. Prominent examples of robust regression models are least trimmed squares (LTS), where the k largest squared deviations are ignored, and least trimmed absolute deviations (LTA) which ignores the k largest absolute deviations. The numerical complexity of both models is driven by the number of binary variables and by the value k of ignored deviations. We introduce leveraged least trimmed absolute deviations (LLTA) which exploits that LTA is already immune against y-outliers. Therefore, LLTA has only to be guarded against outlying values in x, so-called leverage points, which can be computed beforehand, in contrast to y-outliers. Thus, while the mixed-integer formulations of LTS and LTA have as many binary variables as data points, LLTA only needs one binary variable per leverage point, resulting in a significant reduction of binary variables. Based on 11 data sets from the literature, we demonstrate that (1) LLTA’s prediction quality improves much faster than LTS and as fast as LTA for increasing values of k and (2) that LLTA solves the benchmark problems about 80 times faster than LTS and about five times faster than LTA, in median.
Funder
Karlsruher Institut für Technologie (KIT)
Publisher
Springer Science and Business Media LLC
Subject
Management Science and Operations Research,Business, Management and Accounting (miscellaneous)
Reference49 articles.
1. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche J, Vianu V (eds) Database theory—ICDT 2001. Springer, Berlin, pp 420–434
2. Bassett GW Jr (1991) Equivariant, monotonic, 50% breakdown estimators. Am Stat 45(2):135–137
3. Bernholt T (2006) Robust estimators are hard to compute. Tech. rep
4. Bertsimas D, Dunn J (2019) Machine learning under a modern optimization lens. Dynamic Ideas LLC. https://books.google.de/books?id=g3ZWygEACAAJ
5. Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44:813–852
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献