Affiliation:
1. University of Texas at Austin, USA
2. Microsoft Research, USA
Abstract
In application domains that store data in a tabular format, a common task is to fill the values of some cells using values stored in other cells. For instance, such data completion tasks arise in the context of
missing value imputation
in data science and
derived data
computation in spreadsheets and relational databases. Unfortunately, end-users and data scientists typically struggle with many data completion tasks that require non-trivial programming expertise. This paper presents a synthesis technique for automating data completion tasks using
programming-by-example (PBE)
and a very lightweight sketching approach. Given a
formula sketch
(e.g., AVG(?
1
, ?
2
)) and a few input-output examples for each hole, our technique synthesizes a program to automate the desired data completion task. Towards this goal, we propose a domain-specific language (DSL) that combines spatial and relational reasoning over tabular data and a novel synthesis algorithm that can generate DSL programs that are consistent with the input-output examples. The key technical novelty of our approach is a new version space learning algorithm that is based on
finite tree automata
(FTA). The use of FTAs in the learning algorithm leads to a more compact representation that allows more sharing between programs that are consistent with the examples. We have implemented the proposed approach in a tool called DACE and evaluate it on 84 benchmarks taken from online help forums. We also illustrate the advantages of our approach by comparing our technique against two existing synthesizers, namely Prose and Sketch.
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Cited by
30 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Efficient Bottom-Up Synthesis for Programs with Local Variables;Proceedings of the ACM on Programming Languages;2024-01-05
2. Relational Synthesis of Recursive Programs via Constraint Annotated Tree Automata;Lecture Notes in Computer Science;2024
3. Programming by Example Made Easy;ACM Transactions on Software Engineering and Methodology;2023-11-24
4. Saggitarius: A DSL for Specifying Grammatical Domains;Proceedings of the ACM on Programming Languages;2023-10-16
5. Inductive Program Synthesis Guided by Observational Program Similarity;Proceedings of the ACM on Programming Languages;2023-10-16