I Know What You Are Searching for: Code Snippet Recommendation from Stack Overflow Posts

Author:

Gao Zhipeng1,Xia Xin2,Lo David3,Grundy John4,Zhang Xindong5,Xing Zhenchang6

Affiliation:

1. Shanghai Institute for Advanced Study of Zhejiang University, Shanghai, China

2. Huawei, Zhejiang Province, China

3. Singapore Management University, Singapore, Singapore

4. Monash University, Australia

5. Alibaba Group, China

6. CSIRO’s Data61 & Australian National University, Canberra ACT, Australia

Abstract

Stack Overflow has been heavily used by software developers to seek programming-related information. More and more developers use Community Question and Answer forums, such as Stack Overflow, to search for code examples of how to accomplish a certain coding task. This is often considered to be more efficient than working from source documentation, tutorials, or full worked examples. However, due to the complexity of these online Question and Answer forums and the very large volume of information they contain, developers can be overwhelmed by the sheer volume of available information. This makes it hard to find and/or even be aware of the most relevant code examples to meet their needs. To alleviate this issue, in this work, we present a query-driven code recommendation tool, named Que2Code , that identifies the best code snippets for a user query from Stack Overflow posts. Our approach has two main stages: (i) semantically equivalent question retrieval and (ii) best code snippet recommendation. During the first stage, for a given query question formulated by a developer, we first generate paraphrase questions for the input query as a way of query boosting and then retrieve the relevant Stack Overflow posted questions based on these generated questions. In the second stage, we collect all of the code snippets within questions retrieved in the first stage and develop a novel scheme to rank code snippet candidates from Stack Overflow posts via pairwise comparisons. To evaluate the performance of our proposed model, we conduct a large-scale experiment to evaluate the effectiveness of the semantically equivalent question retrieval task and best code snippet recommendation task separately on Python and Java datasets in Stack Overflow. We also perform a human study to measure how real-world developers perceive the results generated by our model. Both the automatic and human evaluation results demonstrate the promising performance of our model, and we have released our code and data to assist other researchers.

Funder

ARC Laureate Fellowship

National Research Foundation, Singapore, under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Reference78 articles.

1. Muhammad Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K. Roy, and Kevin A. Schneider. 2016. Mining duplicate questions of stack overflow. In Proceedings of the IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). IEEE, 402–412.

2. Syed Ahmed and Mehdi Bagherzadeh. 2018. What do concurrency developers ask about? A large-scale study using stack overflow. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10.

3. Layer normalization;Ba Jimmy Lei;arXiv preprint arXiv:1607.06450,2016

4. Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. 2014. Mining questions asked by web developers. In Proceedings of the 11th Working Conference on Mining Software Repositories. 112–121.

5. Antoaneta Baltadzhieva and Grzegorz Chrupała. 2015. Predicting the quality of questions on stackoverflow. In Proceedings of the International Conference Recent Advances in Natural Language Processing. 32–40.

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Learning beyond books: A hybrid model to learn real‐world problems;Computer Applications in Engineering Education;2024-08-21

2. Automatic bi-modal question title generation for Stack Overflow with prompt learning;Empirical Software Engineering;2024-05

3. TopicAns: Topic-informed Architecture for Answer Recommendation on Technical Q&A Site;ACM Transactions on Software Engineering and Methodology;2023-11-24

4. A Closer Look at Different Difficulty Levels Code Generation Abilities of ChatGPT;2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE);2023-09-11

5. Automated Question Title Reformulation by Mining Modification Logs From Stack Overflow;IEEE Transactions on Software Engineering;2023-09-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3