Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach

Author:

Wang Qiming1ORCID,Castro Fernandez Raul1ORCID

Affiliation:

1. The University of Chicago, Chicago, IL, USA

Abstract

Most deployed data discovery systems, such as Google Datasets, and open data portals only support keyword search. Keyword search is geared towards general audiences but limits the types of queries the systems can answer. We propose a new system that lets users write natural language questions directly. A major barrier to using this learned data discovery system is it needs expensive-to-collect training data, thus limiting its utility. In this paper, we introduce a self-supervised approach to assemble training datasets and train learned discovery systems without human intervention. It requires addressing several challenges, including the design of self-supervised strategies for data discovery, table representation strategies to feed to the models, and relevance models that work well with the synthetically generated questions. We combine all the above contributions into a system, Solo, that solves the problem end to end. The evaluation results demonstrate the new techniques outperform state-of-the-art approaches on well-known benchmarks. All in all, the technique is a stepping stone towards building learned discovery systems.

Publisher

Association for Computing Machinery (ACM)

Reference76 articles.

1. Revisiting neural scaling laws in language and vision;Alabdulmohsin Ibrahim M;Advances in Neural Information Processing Systems,2022

2. Dean Allemang , James A Hendler , and Fabien Gandon . 2020. Semantic web for the working ontologist . ACM Press . Dean Allemang, James A Hendler, and Fabien Gandon. 2020. Semantic web for the working ontologist. ACM Press.

3. Yasaman Bahri , Ethan Dyer , Jared Kaplan , Jaehoon Lee , and Utkarsh Sharma . 2021. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701 ( 2021 ). Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. 2021. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701 (2021).

4. Variational Inference: A Review for Statisticians

5. Charles Blundell , Julien Cornebise , Koray Kavukcuoglu , and Daan Wierstra . 2015 . Weight uncertainty in neural network . In International conference on machine learning. PMLR, 1613--1622 . Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight uncertainty in neural network. In International conference on machine learning. PMLR, 1613--1622.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3