TabEE: Tabular Embeddings Explanations-Reference-Cited by-同舟云学术

TabEE: Tabular Embeddings Explanations

Published:2024-03-12 Issue:1 Volume:2 Page:1-26
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Copul Roni¹^ORCID,Frost Nave²^ORCID,Milo Tova¹^ORCID,Razmadze Kathy¹^ORCID

Affiliation:

1. Tel Aviv University, Tel Aviv, Israel

2. eBay Research, Netanya, Israel

Abstract

Tabular embedding methods have become increasingly popular due to their effectiveness in improving the results of various tasks, including classic databases tasks and machine learning predictions. However, most current methods treat these embedding models as "black boxes" making it difficult to understand the insights captured by the models. Our research proposes a novel approach to interpret these models, aiming to provide local and global explanations for the original data and detect potential flaws in the embedding models. The proposed solution is appropriate for every tabular embedding algorithm, as it fits the black box view of the embedding model. Furthermore, we propose methods for comparing different embedding models, which can help identify data biases that might impact the models' credibility without the user's knowledge. Our approach is evaluated on multiple datasets and multiple embeddings, demonstrating that our proposed explanations provide valuable insights into the behavior of tabular embedding methods. By making these models more transparent, we believe our research will contribute to the development of more effective and reliable embedding methods for a wide range of applications.

Funder

BSF - the US-Israel Binational Science foundation

iSF - the Israel Science foundation

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3639329

Reference54 articles.

1. 2015. Flights Dataset. https://www.kaggle.com/usdot/flight-delays'select=flights.csv.

2. 2020. Spotify Dataset. https://www.kaggle.com/datasets/mrmorj/dataset-of-songs-in-spotify.

3. 2023. TabEE git repository. https://github.com/KathyRaz/TabEE.

4. DIFF: a relational interface for large-scale data explanation

5. TabNet: Attentive Interpretable Tabular Learning