DIFF-Reference-Cited by-同舟云学术

DIFF

Published:2018-12 Issue:4 Volume:12 Page:419-432
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Abuzaid Firas¹,Kraft Peter¹,Suri Sahaana¹,Gan Edward¹,Xu Eric¹,Shenoy Atul¹,Ananthanarayan Asvin¹,Sheu John¹,Meijer Erik²,Wu Xi³,Naughton Jeff³,Bailis Peter¹,Zaharia Matei¹

Affiliation:

1. Microsoft

2. Facebook

3. Google

Abstract

A range of explanation engines assist data analysts by performing feature selection over increasingly high-volume and high-dimensional data, grouping and highlighting commonalities among data points. While useful in diverse tasks such as user behavior analytics, operational event processing, and root cause analysis, today's explanation engines are designed as standalone data processing tools that do not interoperate with traditional, SQL-based analytics workflows; this limits the applicability and extensibility of these engines. In response, we propose the DIFF operator, a relational aggregation operator that unifies the core functionality of these engines with declarative relational query processing. We implement both single-node and distributed versions of the DIFF operator in MB SQL, an extension of MacroBase, and demonstrate how DIFF can provide the same semantics as existing explanation engines while capturing a broad set of production use cases in industry, including at Microsoft and Facebook. Additionally, we illustrate how this declarative approach to data explanation enables new logical and physical query optimizations. We evaluate these optimizations on several real-world production applications, and find that DIFF in MB SQL can outperform state-of-the-art engines by up to an order of magnitude.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3297753.3297761

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. TSExplain: Explaining Aggregated Time Series by Surfacing Evolving Contributors;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

2. Mitigating Bias in Algorithmic Systems—A Fish-eye View;ACM Computing Surveys;2022-12-03

3. Automated relational data explanation using external semantic knowledge;Proceedings of the VLDB Endowment;2022-08

4. Towards causal physical error discovery in video analytics systems;Proceedings of the Workshop on Human-In-the-Loop Data Analytics;2022-06-12

5. Interactive Query Explanations Using Fine Grained Provenance;Proceedings of the 2022 International Conference on Management of Data;2022-06-10