Abstract
Programming-by-Examples (PBE) involves synthesizing an "intended program" from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-specific language (DSL) with "effectively-invertible" (EI) operators at the top and "effectively-enumerable" (EE) operators at the bottom. This facilitates an effective combination of top-down synthesis strategy (which backpropagates outputs over various paths in the DSL using inverse functions) with a bottom-up synthesis strategy (which propagates inputs over various paths in the DSL). We address the problem of scaling synthesis to large DSLs with several non-EI/EE operators. This is motivated by the need to support a richer class of transformations and the need for readable code generation. We propose a novel solution strategy that relies on propagating fewer values and over fewer paths.
Our first key idea is that of "cut functions" that prune the set of values being propagated by using knowledge of the sub-DSL on the other side. Cuts can be designed to preserve completeness of synthesis; however, DSL designers may use incomplete cuts to have finer control over the kind of programs synthesized. In either case, cuts make search feasible for non-EI/EE operators and efficient for deep DSLs. Our second key idea is that of "guarded DSLs" that allow a precedence on DSL operators, which dynamically controls exploration of various paths in the DSL. This makes search efficient over grammars with large fanouts without losing recall. It also makes ranking simpler yet more effective in learning an intended program from very few examples. Both cuts and precedence provide a mechanism to the DSL designer to restrict search to a reasonable, and possibly incomplete, space of programs.
Using cuts and gDSLs, we have built FlashFill++, an industrial-strength PBE engine for performing rich string transformations, including datetime and number manipulations. The FlashFill++ gDSL is designed to enable readable code generation in different target languages including Excel's formula language, PowerFx, and Python. We show FlashFill++ is more expressive, more performant, and generates better quality code than comparable existing PBE systems. FlashFill++ is being deployed in several mass-market products ranging from spreadsheet software to notebooks and business intelligence applications, each with millions of users.
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Reference61 articles.
1. Precedences in specifications and implementations of programming languages
2. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
3. Deterministic parsing of ambiguous grammars
4. Rajeev Alur , Rastislav Bodík , Garvit Juniwal , Milo M. K. Martin , Mukund Raghothaman , Sanjit A. Seshia , Rishabh Singh , Armando Solar-Lezama , Emina Torlak , and Abhishek Udupa . 2013 . Syntax-Guided Synthesis. In Formal Methods in Computer-Aided Design , FMCAD 2013. 1–8. Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-Guided Synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013. 1–8.
5. Rajeev Alur Arjun Radhakrishna and Abhishek Udupa. 2017. Scaling Enumerative Program Synthesis via Divide and Conquer. In TACAS. 319–336. Rajeev Alur Arjun Radhakrishna and Abhishek Udupa. 2017. Scaling Enumerative Program Synthesis via Divide and Conquer. In TACAS. 319–336.
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction;Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis;2024-09-11
2. Towards Efficient Data Wrangling with LLMs using Code Generation;Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning;2024-06-09
3. Hydra: Generalizing Peephole Optimizations with Program Synthesis;Proceedings of the ACM on Programming Languages;2024-04-29
4. Enhanced Enumeration Techniques for Syntax-Guided Synthesis of Bit-Vector Manipulations;Proceedings of the ACM on Programming Languages;2024-01-05
5. Programming-by-Demonstration for Long-Horizon Robot Tasks;Proceedings of the ACM on Programming Languages;2024-01-05