Affiliation:
1. Korea University, Republic of Korea
Abstract
We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simple-minded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for data-driven program analysis as well as a learning algorithm to find the model parameters. Our model uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: context-sensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.
Funder
Samsung Research Funding 8 Incubation Center of Samsung Electronics
Institute for Information 8 communications Technology Promotio
Korea governmen
Self-Learning Cyber Immune Technology Development
Publisher
Association for Computing Machinery (ACM)
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Falcon: A Fused Approach to Path-Sensitive Sparse Data Dependence Analysis;Proceedings of the ACM on Programming Languages;2024-06-20
2. Generic Sensitivity: Generics-Guided Context Sensitivity for Pointer Analysis;IEEE Transactions on Software Engineering;2024-05
3. Learning Abstraction Selection for Bayesian Program Analysis;Proceedings of the ACM on Programming Languages;2024-04-29
4. Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection;Proceedings of the IEEE/ACM 46th International Conference on Software Engineering;2024-02-06
5. Precise Data-Driven Approximation for Program Analysis via Fuzzing;2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE);2023-09-11