Contextual word embeddings for tabular data search and integration-Reference-Cited by-同舟云学术

Contextual word embeddings for tabular data search and integration

Published:2022-11-30 Issue:13 Volume:35 Page:9319-9333
ISSN:0941-0643
Container-title:Neural Computing and Applications
language:en
Short-container-title:Neural Comput & Applic

Author:

Pilaluisa José,Tomás David^ORCID,Navarro-Colorado Borja,Mazón Jose-Norberto

Abstract

AbstractThis paper presents a new approach to retrieve and further integrate tabular datasets (collections of rows and columns) using union and join operations. In this work, both processes were carried out using a similarity measure based on contextual word embeddings, which allows finding semantically similar tables and overcome the recall problem of lexical approaches based on string similarity. This work is the first attempt to use contextual word embeddings in the whole pipeline of table search and integration, including for the first time their use in the join operation. A comprehensive analysis of their performance was carried out on both retrieving and integrating tabular datasets, comparing them with context-free models. Column headings and cell values were used as contextual information and their impact on each task was evaluated. The results revealed that contextual models significantly outperform context-free models and a traditional weighting schema in ad hoc table retrieval. In the data integration task, contextual models also improved the results on union operation compared to context-free approaches.

Funder

Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital, Generalitat Valenciana

Spanish Government

Universidad de Alicante

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s00521-022-08066-8.pdf

Reference40 articles.

1. Abadi D, Ailamaki A, Andersen D, Bailis P, Balazinska M, Bernstein P, Boncz P, Chaudhuri S, Cheung A, Doan A et al (2020) The seattle report on database research. ACM SIGMOD Rec 48(4):44–53. https://doi.org/10.1145/3385658.3385668

2. Miller RJ (2018) Open data integration. Proc Very Large Data Base Endow 11(12):2130–2139. https://doi.org/10.14778/3229863.3240491

3. Peters ME, Neumann M, Zettlemoyer L, Yih W-t (2018) Dissecting contextual word embeddings: Architecture and representation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 1499–1509. Association for computational linguistics, Brussels, Belgium. https://doi.org/10.18653/v1/D18-1179

4. Zhang S, Balog K (2018) Ad hoc table retrieval using semantic similarity. In: Proceedings of the 2018 world wide web conference, Lyon, France, pp. 1553–1562. https://doi.org/10.1145/3178876.3186067

5. Zhang S, Balog K (2020) Web table extraction, retrieval, and augmentation: a survey. ACM Transact Intelli Syst Technol 11(2):1–35. https://doi.org/10.1145/3372117

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Leveraging Large Language Models for Sensor Data Retrieval;Applied Sciences;2024-03-15