Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs-Reference-Cited by-同舟云学术

Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs

Published:2023-10-16 Issue:OOPSLA2 Volume:7 Page:1648-1676
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Renda Alex¹^ORCID,Ding Yi²^ORCID,Carbin Michael¹^ORCID

Affiliation:

1. Massachusetts Institute of Technology, Cambridge, USA

2. Purdue University, West Lafayette, USA

Abstract

Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.

Funder

National Science Foundation

Defense Advanced Research Projects Agency

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3622856

Reference45 articles.

1. The Frankencamera

2. Atish Agarwala , Abhimanyu Das , Brendan Juba , Rina Panigrahy , Vatsal Sharan , Xin Wang , and Qiuyi Zhang . 2021 . One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks . In International Conference on Learning Representations. Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, and Qiuyi Zhang. 2021. One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks. In International Conference on Learning Representations.

3. Sanjeev Arora , Simon Du , Wei Hu , Zhiyuan Li , and Ruosong Wang . 2019 . Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. In International Conference on Machine Learning. Sanjeev Arora, Simon Du, Wei Hu, Zhiyuan Li, and Ruosong Wang. 2019. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. In International Conference on Machine Learning.

4. White box sampling in uncertain data processing enabled by program analysis

5. Probability type inference for flexible approximate programming