Revealing the Unseen: AI Chain on LLMs for Predicting Implicit Data Flows to Generate Data Flow Graphs in Dynamically-Typed Code

Author:

Huang Qing1ORCID,Luo Zhiwen1ORCID,Xing Zhenchang2ORCID,Zeng Jinshan1ORCID,Chen Jieshan3ORCID,Xu Xiwei3ORCID,Chen Yong1ORCID

Affiliation:

1. Jiangxi Normal University, School of Computer Information Engineering, China

2. CSIRO’s Data61 & Australian National University, College of Engineering and Computer Science, Australia

3. CSIRO’s Data61, Australia

Abstract

Data flow graphs (DFGs) capture definitions (defs) and uses across program blocks, which is a fundamental program representation for program analysis, testing and maintenance. However, dynamically-typed programming languages like Python present implicit data flow issues that make it challenging to determine def-use flow information at compile time. Static analysis methods like Soot and WALA are inadequate for handling these issues, and manually enumerating comprehensive heuristic rules is impractical. Large pre-trained language models (LLMs) offer a potential solution, as they have powerful language understanding and pattern matching abilities, allowing them to predict implicit data flow by analyzing code context and relationships between variables, functions, and statements in code. We propose leveraging LLMs’ in-context learning ability to learn implicit rules and patterns from code representation and contextual information to solve implicit data flow problems. To further enhance the accuracy of LLMs, we design a five-step Chain of Thought (CoT) and break it down into an AI chain, with each step corresponding to a separate AI unit to generate accurate DFGs for Python code. Our approach’s performance is thoroughly assessed, demonstrating the effectiveness of each AI unit in the AI Chain. Compared to static analysis, our method achieves 82% higher def coverage and 58% higher use coverage in DFG generation on implicit data flow. We also prove the indispensability of each unit in the AI Chain. Overall, our approach offers a promising direction for building software engineering tools by utilizing foundation models, eliminating significant engineering and maintenance effort, but focusing on identifying problems for AI to solve.

Publisher

Association for Computing Machinery (ACM)

Reference78 articles.

1. Hemant D Pande and William Landi. Interprocedural def-use associations in c programs. In Proceedings of the symposium on Testing, analysis, and verification, pages 139–153, 1991.

2. Jan Midtgaard. Control-flow analysis of functional programs. ACM computing surveys (CSUR), 44(3):1–33, 2012.

3. Rijwan Khan and Akhilesh Kumar Srivastava. Automatic software testing framework for all def-use with genetic algorithm. Int J Innov Technol Explor Eng (IJITEE), 8(8):2055–2060, 2019.

4. Ting Su, Ke Wu, Weikai Miao, Geguang Pu, Jifeng He, Yuting Chen, and Zhendong Su. A survey on data-flow testing. ACM Computing Surveys (CSUR), 50(1):1–35, 2017.

5. Zoltán Ujhelyi and Dániel Varró. Def-use analysis of model transformation programs with program slicing. In 18th PhD Mini-Symposium, pages 46–49. Budapest University of Technology and Economics, 2011.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3