Software Defect Prediction Based on GUHA Data Mining Procedure and Multi-Objective Pareto Efficient Rule Selection
Author:
Mishra Bharavi1, Shukla K.K.1
Affiliation:
1. Department of Computer Engineering, Indian Institute of Technology (BHU), Varanasi, India
Abstract
Software defect prediction, if is effective, enables the developers to distribute their testing efforts efficiently and let them focus on defect prone modules. It would be very resource consuming to test all the modules while the defect lies in fraction of modules. Information about fault-proneness of classes and methods can be used to develop new strategies which can help mitigate the overall development cost and increase the customer satisfaction. Several machine learning strategies have been used in recent past to identify defective modules. These models are built using publicly available historical software defect data sets. Most of the proposed techniques are not able to deal with the class imbalance problem efficiently. Therefore, it is necessary to develop a prediction model which consists of small simple and comprehensible rules. Considering these facts, in this paper, the authors propose a novel defect prediction approach named GUHA based Classification Association Rule Mining algorithm (G-CARM) where “GUHA” stands for General Unary Hypothesis Automaton. G-CARM approach is primarily based on Classification Association Rule Mining, and deploys a two stage process involving attribute discretization, and rule generation using GUHA. GUHA is oldest yet very powerful method of pattern mining. The basic idea of GUHA procedure is to mine the interesting attribute patterns that indicate defect proneness. The new method has been compared against five other models reported in recent literature viz. Naive Bayes, Support Vector Machine, RIPPER, J48 and Nearest Neighbour classifier by using several measures, including AUC and probability of detection. The experimental results indicate that the prediction performance of G-CARM approach is better than other prediction approaches. The authors' approach achieved 76% mean recall and 83% mean precision for defective modules and 93% mean recall and 83% mean precision for non-defective modules on CM1, KC1, KC2 and Eclipse data sets. Further defect rule generation process often generates a large number of rules which require considerable efforts while using these rules as a defect predictor, hence, a rule sub-set selection process is also proposed to select best set of rules according to the requirements. Evolution criteria for defect prediction like sensitivity, specificity, precision often compete against each other. It is therefore, important to use multi-objective optimization algorithms for selecting prediction rules. In this paper the authors report prediction rules that are Pareto efficient in the sense that no further improvements in the rule set is possible without sacrificing some performance criteria. Non-Dominated Sorting Genetic Algorithm has been used to find Pareto front and defect prediction rules.
Subject
Pharmacology (medical)
Reference55 articles.
1. Agrawal, R., & Srikant, R. (1994). Fast Algorithm for Mining Association Rules. Proceeding of the 20th VLDB conference, Morgan Kaufmann, Santiago, Chile, (pp. 487– 499). 2. Ahmed, F., Jindal, A., & Deb, K. (2011). Cricket Team Selection Using Evolutionary Multi- objective Optimization. Lecture Notes in Computer Science, Springer, 7977, 71-78. 3. Aubrecht1, P., Kejkula, M., Kremen, P., Novakov, L., Rauch, J., Simunek, M., Zakov, M. (2005). Mining in Hepatitis Data by LISp-Miner and SumatraTT, A Tutorial. Petr Berka, Bruno Crémilleux (Eds.), Discovery Challenge, ECML, Porto. 4. Combining and adapting software quality predictive models by genetic algorithms 5. Baronti, F., & Starita, A. (2007). Hypothesis Testing with Classifier Systems for Rule- Based Risk Prediction. Lecture Notes in Computer Science, Springer, 4447, 24–34.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. OpenSZZ;Proceedings of the 28th International Conference on Program Comprehension;2020-07-13
|
|