ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA-Reference-Cited by-同舟云学术

ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA

Published:2020-08-20 Issue:Number 4 Volume:19 Page:459-482
ISSN:2180-3862
Container-title:Journal of Information and Communication Technology
language:en
Short-container-title:JICT

Author:

Raja Kumaran Shamini¹,Othman Mohd Shahizan¹,Mi Yusuf Lizawati¹

Affiliation:

1. School of Computing, Universiti Teknologi Malaysia, Malaysia

Abstract

Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swarm Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks.

Publisher

UUM Press, Universiti Utara Malaysia

Subject

General Mathematics,General Computer Science

Reference41 articles.

1. Baraldi, P., Di Maio, F., Genini, D., & Zio, E. (2015). Reconstruction of missing data in multidimensional time series by fuzzy similarity. Applied Soft Computing, 26, 1–9. https: //doi.org/ 10.1016/j.asoc.2014.09.038

2. Bertsimas, D., Pawlowski, C., & Zhuo, Y. D. (2017). From predictive methods to missing data imputation: An optimization method. The Journal of Machine Learning Research, 18(1), 7133–7171. Retrieved from http:// jmlr.org/papers/v18/17-073.html

3. Bezdek, J. C., Coray, C., Gunderson, R., & Watson, J. (1981). Detection and characterization of cluster substructure I. Linear structure: Fuzzy c-lines. Siam Journal on Applied Mathematics, 40(2), 339–357.

4. Bose, S., Das, C., Gangopadhyay, T., & Chattopadhyay, S. (2013, December). A modified local least square based missing value estimation method in microarray gene expression data. In 2013 2nd International Conference on Advanced Computing, Networking and Security, Mangalore, India (pp. 18–23). https: //doi.org/ 10.1109/ADCONS.2013.11

5. Chattopadhyay, S., Das, C., & Bose, S. (2015, December). A novel biclustering based missing value prediction method for microarray gene expression data. In 2015 International Conference on Man and Machine Interfacing (MAMI), Bhubaneswar, India (pp. 1–6). https: //doi. org/ 10.1109/MAMI.2015.7456603

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PSO-FCM Intelligent Algorithm in Computer Network Data Detection;Lecture Notes on Data Engineering and Communications Technologies;2023

2. Clustering column-mean quantile median: a new methodology for imputing missing data;Journal of Engineering and Applied Science;2022-12

3. Lung cancer detection by using probabilistic majority voting and optimization techniques;International Journal of Imaging Systems and Technology;2022-06-13

4. Estimation of Information Measures for Power-Function Distribution in Presence of Outliers and Their Applications;Journal of Information and Communication Technology;2021-11-11