Missing value imputation on gene expression data using bee-based algorithm to improve classification performance-Reference-Cited by-同舟云学术

Missing value imputation on gene expression data using bee-based algorithm to improve classification performance

Published:2024-08-29 Issue:8 Volume:19 Page:e0305492
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Chungnoy Kritanat^ORCID,Tanantong Tanatorn,Songmuang Pokpong^ORCID

Abstract

Existing missing value imputation methods focused on imputing the data regarding actual values towards a completion of datasets as an input for machine learning tasks. This work proposes an imputation of missing values towards improvement of accuracy performance for classification. The proposed method was based on bee algorithm and the use of k-nearest neighborhood with linear regression to guide on finding the appropriate solution in prevention of randomness. Among the processes, GINI importance score was utilized in selecting values for imputation. The imputed values thus reflected on improving a discriminative power in classification tasks instead of replicating the actual values from the original dataset. In this study, we evaluated the proposed method against frequently used imputation methods such as k-nearest neighborhood, principal components analysis, nonlinear principal, and component analysis to compare root mean square error results and accuracy of using imputed datasets in a classification task. The experimental results indicated that our proposed method obtained the best accuracy results from all datasets comparing to other methods. In comparison to original dataset, the classification model from imputed datasets yielded 15-25% higher accuracy in class prediction. From analysis, the results showed that feature ranking used in a classification process was affected and lead to noticeably change in informativeness as the imputed data from the proposed method played the role to boost a discriminating power.

Funder

Scholarship for research promotion student for international and education in Faculty of Science and Technology Thammasat University

Thammasat University Research Unit in Data Innovation and Artificial Intelligence

Publisher

Public Library of Science (PLoS)

Reference38 articles.

1. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization;PT Spellman;Molecular biology of the cell,1998

2. High-throughput methods for detection of genetic variation;VN Kristensen;Biotechniques,2001

3. Molecular portraits of human breast tumours;CM Perou;nature,2000

4. Identification of expressed genes linked to malignancy of human colorectal carcinoma by parametric clustering of quantitative expression data;S Muro;Genome biology,2003

5. Cross-species antibody microarray interrogation identifies a 3-protein panel of plasma biomarkers for early diagnosis of pancreas cancer;JE Mirus;Clinical Cancer Research,2015