A Survey of Machine Learning for Big Code and Naturalness-Reference-Cited by-同舟云学术

A Survey of Machine Learning for Big Code and Naturalness

Published:2019-07-31 Issue:4 Volume:51 Page:1-37
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Allamanis Miltiadis¹^ORCID,Barr Earl T.²,Devanbu Premkumar³,Sutton Charles⁴

Affiliation:

1. Microsoft Research, Cambridge, United Kingdom

2. University College London, Gower Street, United Kingdom

3. University of California, Davis, California, USA

4. University of Edinburgh and The Alan Turing Institute, Edinburgh, United Kingdom

Abstract

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.

Funder

Engineering and Physical Sciences Research Council

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3212695

Reference197 articles.

1. Mining API patterns as partial orders from source code

2. Learning natural coding conventions

3. Suggesting accurate method and class names

Cited by 453 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HardVD: High-capacity cross-modal adversarial reprogramming for data-efficient vulnerability detection;Information Sciences;2025-01

2. GraphPyRec: A novel graph-based approach for fine-grained Python code recommendation;Science of Computer Programming;2024-12

3. On Representation Learning-based Methods for Effective, Efficient, and Scalable Code Retrieval;Neurocomputing;2024-10

4. On the effectiveness of hybrid pooling in mixup-based graph learning for language processing;Journal of Systems and Software;2024-10

5. Impermanent identifiers: Enhanced source code comprehension and refactoring;Journal of Systems and Software;2024-10