Optimized fuzzy clustering‐based k‐nearest neighbors imputation for mixed missing data in software development effort estimation-Reference-Cited by-同舟云学术

Optimized fuzzy clustering‐based k‐nearest neighbors imputation for mixed missing data in software development effort estimation

Published:2023-01-04 Issue: Volume: Page:
ISSN:2047-7473
Container-title:Journal of Software: Evolution and Process
language:en
Short-container-title:J Software Evolu Process

Author:

Abnane Ibtissam¹,Idri Ali¹²,Abran Alain³

Affiliation:

1. Software Project Management Research Team ENSIAS, Mohammed V University Rabat Morocco

2. Mohammed VI Polytechnic University Ben Guerir Morocco

3. Dept. of Software Engineering and Information Technology, ETS University of Quebec Montreal Québec Canada

Abstract

AbstractContextSoftware development effort estimation (SDEE) is one of the most challenging aspects in project management. The presence of missing data (MD) in software attributes makes SDEE even more complex. K‐nearest neighbors imputation (KNNI) has been widely used in SDEE to deal with the MD issue. However, KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features. When dealing with categorical attributes, KNNI uses a classical approach, employing mainly numbers or classical intervals to represent software attributes and similarity measures originally designed for numerical attributes.ObjectivesThis paper evaluates the use of an optimized fuzzy clustering‐based KNNI (FC‐KNNI) and compares it with classical KNN when dealing with mixed data in the context of SDEE.MethodsWe investigate the effect of two imputation techniques (FC‐KNNI and KNNI) on five SDEE techniques: case‐based reasoning, fuzzy case‐based reasoning, support vector regression, multilayer perceptron, and reduced‐error pruning tree. The evaluation is carried out using six publicly available datasets for SDEE using two performance measures, standardized accuracy (SA), and Pred (0.25). The Wilcoxon statistical test is also performed to assess the significance of results.ResultsThe results are promising in the sense that using an imputation technique designed for mixed data is better than reusing methods originally designed for numerical data. We found that FC‐KNNI significantly outperforms KNNI regardless of the SDEE technique and dataset used. Another important finding is that F‐CBR improved the analogy process compared to CBR.ConclusionThe introduction of fuzzy sets and fuzzy clustering in the analogy process improves its performances in terms of SA and Pred (0.25).

Publisher

Wiley

Subject

Software

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2529

Reference68 articles.

1. Systematic literature review of machine learning based software development effort estimation models

2. Missing data techniques in analogy-based software development effort estimation

3. Systematic literature review of ensemble effort estimation

4. Research patterns and trends in software effort estimation

5. Improved Analogy-based Effort Estimation with Incomplete Mixed Data

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring issues of story-based effort estimation in Agile Software Development (ASD);Science of Computer Programming;2024-09