Affiliation:
1. University of New South Wales, Sydney, Australia
2. Georgia Institute of Technology, Atlanta, USA
Abstract
Context-free language reachability (CFL-reachability) is a prominent model for formulating program analysis problems. Almost all CFL-reachability algorithms are based on the Reps-Horwitz-Sagiv (RHS) tabulation. In essence, the RHS tabulation, based on normalized context-free grammars, is similar to the CYK algorithm for CFL-parsing. Consider a normalized rule S ::= A B and a CFL-reachability problem instance of computing S-edges in the input graph. The RHS tabulation obtains all summary edges (i.e., S-, A-, and B-edges) based on the grammar rules. However, many A- and B-edges are wasted because only a subset of those edges eventually contributes to generating S-edges in the input graph.
This paper proposes a new tabulation strategy for speeding up CFL-reachability by eliminating wasted and unnecessary summary edges. We particularly focus on recursive nonterminals. Our key technical insight is that the wasted edge generations and insertions caused by recursive nonterminals can be avoided by modifying the parse trees either statically (by transforming the grammar) or dynamically (using a specialized online CFL-reachability solver). For example, if a recursive nonterminal B, generated by a rule B ::= B X, appears on the right-hand side of a rule S ::= A B, we can make S recursive (by introducing a new rule S ::= S X) and eliminate the original recursive rule (B ::= B X). Due to the rule S ::= S X, the shapes of the parse trees associated with the left-hand-side nonterminal S become more "skewed". Thus, we name our approach skewed tabulation for CFL-reachability.
Skewed tabulation can significantly improve the scalability of CFL-reachability by reducing wasted and unnecessary summary edges. We have implemented skewed tabulation and applied the corresponding CFL-reachability algorithm to an alias analysis, a value-flow analysis, and a taint analysis. Our extensive evaluation based on SPEC 2017 benchmarks yields promising results. For the three client analyses, CFL-reachability based on skewed tabulation can achieve 3.34×, 1.13× and 2.05× speedup over the state-of-the-art RHS-tabulation-based CFL-reachability solver and consume 60.05%, 20.38% and 63.06% less memory, respectively. Furthermore, the cost of grammar transformation for skewed tabulation is negligible, typically taking less than one second.
Funder
Australian Research Grants
Google
NSF
Defense Advanced Research Projects Agency
Publisher
Association for Computing Machinery (ACM)
Reference45 articles.
1. Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0321486811
2. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley.
3. François Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D. Ullman. 1986. Magic Sets and Other Strange Ways to Implement Logic Programs. In Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS). 1–15.
4. On the power of magic
5. Strictly declarative specification of sophisticated points-to analyses