Generating precise error specifications for C: a zero shot learning approach-Reference-Cited by-同舟云学术

Generating precise error specifications for C: a zero shot learning approach

Published:2019-10-10 Issue:OOPSLA Volume:3 Page:1-30
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Wu Baijun¹,Campora III John Peter¹,He Yi¹,Schlecht Alexander¹,Chen Sheng¹

Affiliation:

1. University of Louisiana at Lafayette, USA

Abstract

In C programs, error specifications, which specify the value range that each function returns to indicate failures, are widely used to check and propagate errors for the sake of reliability and security. Various kinds of C analyzers employ error specifications for different purposes, e.g., to detect error handling bugs, yet a general approach for generating precise specifications is still missing. This limits the applicability of those tools. In this paper, we solve this problem by developing a machine learning-based approach named MLPEx. It generates error specifications by analyzing only the source code, and is thus general. We propose a novel machine learning paradigm based on transfer learning, enabling MLPEx to require only one-time minimal data labeling from us (as the tool developers) and zero manual labeling efforts from users. To improve the accuracy of generated error specifications, MLPEx extracts and exploits project-specific information. We evaluate MLPEx on 10 projects, including 6 libraries and 4 applications. An investigation of 3,443 functions and 17,750 paths reveals that MLPEx generates error specifications with a precision of 91% and a recall of 94%, significantly higher than those of state-of-the-art approaches. To further demonstrate the usefulness of the generated error specifications, we use them to detect 57 bugs in 5 tested projects.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3360586

Reference69 articles.

1. 2007. OWASP TOP 10. https://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdf 2007. OWASP TOP 10. https://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdf

2. 2019. CVE-2019-12818. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12818 2019. CVE-2019-12818. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12818

3. 2019. MongoDB. https://www.mongodb.com/ 2019. MongoDB. https://www.mongodb.com/

4. 2019. Neo4j. https://neo4j.com/ 2019. Neo4j. https://neo4j.com/

5. Mithun Acharya and Tao Xie. 2009. Mining API Error-Handling Specifications from Source Code. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software ETAPS 2009 (FASE ’09). Springer-Verlag Berlin Heidelberg 370–384. Mithun Acharya and Tao Xie. 2009. Mining API Error-Handling Specifications from Source Code. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software ETAPS 2009 (FASE ’09). Springer-Verlag Berlin Heidelberg 370–384.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Interleaving Static Analysis and LLM Prompting;Proceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis;2024-06-20

2. Transcode: Detecting Status Code Mapping Errors in Large-Scale Systems;2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE);2021-11

3. Learning with Partial Multi-Outlooks;2020 International Joint Conference on Neural Networks (IJCNN);2020-07

4. Detecting and reproducing error-code propagation bugs in MPI implementations;Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2020-02-19