Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit-Reference-Cited by-同舟云学术

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

Published:2024-05-18 Issue: Volume: Page:
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Wan Yao¹^ORCID,Bi Zhangqian¹^ORCID,He Yang²^ORCID,Zhang Jianguo³^ORCID,Zhang Hongyu⁴^ORCID,Sui Yulei⁵^ORCID,Xu Guandong⁶^ORCID,Jin Hai¹^ORCID,Yu Philip⁷^ORCID

Affiliation:

1. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

2. Simon Fraser University, Burnaby, Canada

3. Salesforce Inc, San Francisco, United States

4. Chongqing University, Chongqing, China

5. University of New South Wales, Sydney, Australia

6. University of Technology Sydney, Sydney, Australia

7. Department of Computer Science, University of Illinois at Chicago, Chicago, United States

Abstract

Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models (https://xcodemind.github.io). At last, we also point out several challenging and promising directions for future research.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3664597

Reference309 articles.

1. 2019. GitHub. https://www.github.com. [Online; accessed 1-May-2019].

2. 2019. StackOverflow. https://www.stackoverflow.com. [Online; accessed 1-May-2019].

3. Wasi Ahmad Saikat Chakraborty Baishakhi Ray and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In NAACL. 2655–2668.

4. Wasi Uddin Ahmad Saikat Chakraborty Baishakhi Ray and Kai-Wei Chang. 2020. A Transformer-based Approach for Source Code Summarization. In ACL. 4998–5007.

5. Toufique Ahmed and Premkumar Devanbu. 2022. Multilingual training for Software Engineering. In ICSE.