Abstract
Code comments are considered an efficient way to document the functionality of a particular block of code. Code commenting is a common practice among developers to explain the purpose of the code in order to improve code comprehension and readability. Researchers investigated the effect of code comments on software development tasks and demonstrated the use of comments in several ways, including maintenance, reusability, bug detection, etc. Given the importance of code comments, it becomes vital for novice developers to brush up on their code commenting skills. In this study, we initially investigated what types of comments novice students document in their source code and further categorized those comments using a machine learning approach. The work involves the initial manual classification of code comments and then building a machine learning model to classify student code comments automatically. The findings of our study revealed that novice developers/students’ comments are mainly related to Literal (26.66%) and Insufficient (26.66%). Further, we proposed and extended the taxonomy of such source code comments by adding a few more categories, i.e., License (5.18%), Profile (4.80%), Irrelevant (4.80%), Commented Code (4.44%), Autogenerated (1.48%), and Improper (1.10%). Moreover, we assessed our approach with three different machine-learning classifiers. Our implementation of machine learning models found that Decision Tree resulted in the overall highest accuracy, i.e., 85%. This study helps in predicting the type of code comments for a novice developer using a machine learning approach that can be implemented to generate automated feedback for students, thus saving teachers time for manual one-on-one feedback, which is a time-consuming activity.
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference36 articles.
1. Smit, M., Gergel, B., Hoover, H.J., and Stroulia, E. (2011). Maintainability and source code conventions: An analysis of open source projects. Univ. Alta. Dep. Comput. Sci. Tech. Rep. TR11, 6.
2. dos Santos, R.M., and Gerosa, M.A. (2018, January 27–28). Impacts of coding practices on readability. Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden.
3. Program readability: Procedures versus comments;Tenny;IEEE Trans. Softw. Eng.,1988
4. Procedures and comments vs. the banker’s algorithm;Tenny;Acm Sigcse Bull.,1985
5. Rubio-González, C., and Liblit, B. (2010, January 5–6). Expect the unexpected: Error code mismatches between documentation and the real world. Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, Toronto, ON, Canada.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Source Code Summarization & Comment Generation with NLP : A New Index Proposal;2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA);2024-05-23
2. MPI-RICAL: Data-Driven MPI Distributed Parallelism Assistance with Transformers;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12