Clustering column-mean quantile median: a new methodology for imputing missing data-Reference-Cited by-同舟云学术

Clustering column-mean quantile median: a new methodology for imputing missing data

Published:2022-12 Issue:1 Volume:69 Page:
ISSN:1110-1903
Container-title:Journal of Engineering and Applied Science
language:en
Short-container-title:J. Eng. Appl. Sci.

Author:

Yehia Nourhan^ORCID,Wahed Manal Abdel,Mabrouk Mai Said

Abstract

AbstractDNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, especially in the analysis of gene expression data. Furthermore, gene expression data can describe the transcription and translation processes of each genetic information in detail. In this study, a new system was proposed to impute more realizable values for missing data in a microarray dataset. This system was validated and evaluated on 42 samples of rectal cancer. Several evaluation tests were also conducted to confirm the effectiveness of the new system and compare it with highly known imputing algorithms. The proposed clustering column-mean quantile median technique could predict highly informative missing genes, thereby reducing the difference between the original and imputed datasets and demonstrating its efficiency.

Publisher

Springer Science and Business Media LLC

Subject

General Engineering

Link

https://link.springer.com/content/pdf/10.1186/s44147-022-00148-7.pdf

Reference19 articles.

1. Tuikkala J (2006) Improving missing value estimation in microarray data with gene ontology. Bioinformatics 22(5):566–572

2. Liew AW-C (2011) Missing value imputation for gene expression data: computational techniques to recover missing data from available. Brief Bioinform 12(5):498–513

3. Li H (2014) A hybrid imputation approach for microarray missing value estimation, I.E.E.E. International Conference on Bioinformatics and Biomedicine.

4. International Conference on Applied and Theoretical Computing and Communication Technology;HL Shashirekha,2015

5. Farswan A (2020) Imputation of gene expression data in blood cancer and its significance in inferring biological pathways. Front Oncol 9:1442