DeepBugs: a learning approach to name-based bug detection-Reference-Cited by-同舟云学术

DeepBugs: a learning approach to name-based bug detection

Published:2018-10-24 Issue:OOPSLA Volume:2 Page:1-25
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Pradel Michael¹,Sen Koushik²

Affiliation:

1. TU Darmstadt, Germany

2. University of California at Berkeley, USA

Abstract

Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3276517

Reference62 articles.

1. Building Useful Program Analysis Tools Using an Extensible Java Compiler

2. Learning natural coding conventions

Cited by 177 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing vulnerability detection via AST decomposition and neural sub-tree encoding;Expert Systems with Applications;2024-03

2. Predicting Performance and Accuracy of Mixed-Precision Programs for Precision Tuning;Proceedings of the 46th IEEE/ACM International Conference on Software Engineering;2024-02-06

3. Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning;Proceedings of the 46th IEEE/ACM International Conference on Software Engineering;2024-02-06

4. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation;IEEE Transactions on Software Engineering;2024-01

5. Deep learning with class-level abstract syntax tree and code histories for detecting code modification requirements;Journal of Systems and Software;2023-12