Generating precise error specifications for C: a zero shot learning approach

Author:

Wu Baijun1,Campora III John Peter1,He Yi1,Schlecht Alexander1,Chen Sheng1

Affiliation:

1. University of Louisiana at Lafayette, USA

Abstract

In C programs, error specifications, which specify the value range that each function returns to indicate failures, are widely used to check and propagate errors for the sake of reliability and security. Various kinds of C analyzers employ error specifications for different purposes, e.g., to detect error handling bugs, yet a general approach for generating precise specifications is still missing. This limits the applicability of those tools. In this paper, we solve this problem by developing a machine learning-based approach named MLPEx. It generates error specifications by analyzing only the source code, and is thus general. We propose a novel machine learning paradigm based on transfer learning, enabling MLPEx to require only one-time minimal data labeling from us (as the tool developers) and zero manual labeling efforts from users. To improve the accuracy of generated error specifications, MLPEx extracts and exploits project-specific information. We evaluate MLPEx on 10 projects, including 6 libraries and 4 applications. An investigation of 3,443 functions and 17,750 paths reveals that MLPEx generates error specifications with a precision of 91% and a recall of 94%, significantly higher than those of state-of-the-art approaches. To further demonstrate the usefulness of the generated error specifications, we use them to detect 57 bugs in 5 tested projects.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Reference69 articles.

1. 2007. OWASP TOP 10. https://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdf 2007. OWASP TOP 10. https://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdf

2. 2019. CVE-2019-12818. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12818 2019. CVE-2019-12818. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12818

3. 2019. MongoDB. https://www.mongodb.com/ 2019. MongoDB. https://www.mongodb.com/

4. 2019. Neo4j. https://neo4j.com/ 2019. Neo4j. https://neo4j.com/

5. Mithun Acharya and Tao Xie. 2009. Mining API Error-Handling Specifications from Source Code. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software ETAPS 2009 (FASE ’09). Springer-Verlag Berlin Heidelberg 370–384. Mithun Acharya and Tao Xie. 2009. Mining API Error-Handling Specifications from Source Code. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software ETAPS 2009 (FASE ’09). Springer-Verlag Berlin Heidelberg 370–384.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Interleaving Static Analysis and LLM Prompting;Proceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis;2024-06-20

2. Transcode: Detecting Status Code Mapping Errors in Large-Scale Systems;2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE);2021-11

3. Learning with Partial Multi-Outlooks;2020 International Joint Conference on Neural Networks (IJCNN);2020-07

4. Detecting and reproducing error-code propagation bugs in MPI implementations;Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2020-02-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3