Classification of Bugs in Cloud Computing Applications Using Machine Learning Techniques

Author:

Tabassum Nadia1ORCID,Namoun Abdallah2ORCID,Alyas Tahir3ORCID,Tufail Ali4ORCID,Taqi Muhammad3,Kim Ki-Hyung5

Affiliation:

1. Department of Computer Science, Virtual University of Pakistan, Lahore 54000, Pakistan

2. Faculty of Computer and Information Systems, Islamic University of Madinah, Medina 42351, Saudi Arabia

3. Department of Computer Science, Lahore Garrison University, Lahore 54000, Pakistan

4. School of Digital Science, Universiti Brunei Darussalam, Tungku Link, Bandar Seri Begawan BE1410, Brunei

5. Department of Cyber Security, Ajou University, Suwon 16499, Republic of Korea

Abstract

In software development, the main problem is recognizing the security-oriented issues within the reported bugs due to their unacceptable failure rate to provide satisfactory reliability on customer and software datasets. The misclassification of bug reports has a direct impact on the effectiveness of the bug prediction model. The misclassification issue surely compromises the accuracy of the system. Manually reviewing bug reports is necessary to solve this problem, but doing so takes a lot of time and is tiresome for developers and testers. This paper proposes a novel hybrid approach based on natural language processing (NLP) and machine learning. To address these issues, the intended outcomes are multi-class supervised classification and bug prioritization using supervised classifiers. After being collected, the dataset was prepared for vectorization, subjected to exploratory data analysis, and preprocessed. The feature extraction and selection methods used for a bag of words are TF-IDF and word2vec. Machine learning models are created after the dataset has undergone a full transformation. This study proposes, develops, and assesses four classifiers: multinomial Naive Bayes, decision tree, logistic regression, and random forest. The hyper-parameters of the models are tuned, and it is concluded that random forest outperformed with a 91.73% test and 100% training accuracy. The SMOTE technique was used to balance the highly imbalanced dataset, which was initially created for the justified classification. The comparison between balanced and imbalanced dataset models clearly showed the importance of the balanced dataset in classification as it outperformed in all experiments.

Funder

MSIT

KIAT

Basic Science Research Program through the National Research Foundation of Korea

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference28 articles.

1. Kim, J. (2022, July 16). Deep Learning vs. Machine Learning vs. AI: An InDepth Guide, readspeaker.ai, 3 May 2021. Available online: https://www.readspeaker.ai/blog/deep-learning-vs-machine-learning/.

2. Survey on software defect prediction techniques;Thota;Int. J. Appl. Sci. Eng.,2020

3. Determining Bug Prioritization Using Feature Reduction and Clustering With Classification;Iqbal;IEEE Access,2020

4. Emotion Based Automated Priority Prediction for Bug Reports;Umer;IEEE Access,2018

5. Harer, J.A., Kim, L.Y., Russell, R.L., Ozdemir, O., Kosta, L.R., Rangamani, A., Hamilton, L.H., Centeno, G.I., Key, J.R., and Ellingwood, P.M. (2018). Automated software vulnerability detection with machine learning. arXiv.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3