Exploring the Impact of Balanced and Imbalanced Learning in Source Code Suggestion-Reference-Cited by-同舟云学术

Exploring the Impact of Balanced and Imbalanced Learning in Source Code Suggestion

Published:2022-10 Issue:10 Volume:32 Page:1499-1526
ISSN:0218-1940
Container-title:International Journal of Software Engineering and Knowledge Engineering
language:en
Short-container-title:Int. J. Soft. Eng. Knowl. Eng.

Author:

Hussain Yasir¹^ORCID,Huang Zhiqiu¹,Zhou Yu¹,Khan Izhar Ahmed¹

Affiliation:

1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, Jiangsu 211106, P. R. China

Abstract

Studies have confirmed the robust performance of machine learning classifiers for various source code modeling tasks. In general, machine learning approaches are incapable of handling imbalanced datasets, since they are sensitive to the choice of diverse classes. Therefore, these approaches may lean towards the classes with a large percentage of observations. In this work, we investigate and explore the impact of balanced and imbalanced learning on source code suggestion task otherwise known as code completion, covering a large number of imbalanced classes. We further explore the impact of vocabulary size on modeling performance. First, we provide the essentials to formulate the problem of source code suggestion as a classification task and investigate the level of imbalanced classes. Second, we train the four most adapted neural language models as a baseline to assess the modeling performance. Third, we impose two diverse class balancing techniques, TomekLinks and AllKNN, to balance the datasets and evaluate their impact on the modeling performance. Finally, we trained these models with a weighted imbalanced learning approach and compared the performance with balanced learning approaches. Additionally, we train models by varying the vocabulary size to study their impact. In total, we trained 230 models on 10 real-world software projects and extensively evaluated these models with widely used performance metrics such as Precision, Recall, FScore, mean reciprocal rank (MRR), and Receiver operating characteristics (ROC). Additionally, we employed ANOVA statistical analysis to study the statistical significance and differences between these approaches. This study has demonstrated that the modeling performance decreases during balanced model training, whereas the weighted imbalance training produces comparable results and is more efficient in terms of time cost. Additionally, this study exhibits that a large size of vocabulary does not necessarily improve the modeling performance when out-of-vocabulary predictions are disregarded.

Funder

National Outstanding Youth Science Fund Project of National Natural Science Foundation of China

Publisher

World Scientific Pub Co Pte Ltd

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218194022500589

Reference43 articles.

1. Approximation of dynamical systems by continuous time recurrent neural networks

2. Long Short-Term Memory

3. Deep code comment generation with hybrid lexical and syntactical information