An efficient ensemble method for missing value imputation in microarray gene expression data-Reference-Cited by-同舟云学术

An efficient ensemble method for missing value imputation in microarray gene expression data

Published:2021-04-13 Issue:1 Volume:22 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Zhu Xinshan,Wang Jiayu,Sun Biao,Ren Chao,Yang Ting,Ding Jie

Abstract

Abstract Background The genomics data analysis has been widely used to study disease genes and drug targets. However, the existence of missing values in genomics datasets poses a significant problem, which severely hinders the use of genomics data. Current imputation methods based on a single learner often explores less known genomic data information for imputation and thus causes the imputation performance loss. Results In this study, multiple single imputation methods are combined into an imputation method by ensemble learning. In the ensemble method, the bootstrap sampling is applied for predictions of missing values by each component method, and these predictions are weighted and summed to produce the final prediction. The optimal weights are learned from known gene data in the sense of minimizing a cost function about the imputation error. And the expression of the optimal weights is derived in closed form. Additionally, the performance of the ensemble method is analytically investigated, in terms of the sum of squared regression errors. The proposed method is simulated on several typical genomic datasets and compared with the state-of-the-art imputation methods at different noise levels, sample sizes and data missing rates. Experimental results show that the proposed method achieves the improved imputation performance in terms of the imputation accuracy, robustness and generalization. Conclusion The ensemble method possesses the superior imputation performance since it can make use of known data information more efficiently for missing data imputation by integrating diverse imputation methods and learning the integration weights in a data-driven way.

Funder

National Natural Science Foundation of China

Opening Project of State Key Laboratory of Digital Publishing Technology

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-021-04109-4.pdf

Reference49 articles.

1. Kristensen VN, Kelefiotis D, Kristensen T, Borresen-Dale A-L. High-throughput methods for detection of genetic variation. Biotechniques. 2001;30(2):318–33.

2. Muro S, Takemasa I, Oba S, Matoba R, Ueno N, Maruyama C, Yamashita R, Sekimoto M, Yamamoto H, Nakamori S, Monden M, Ishii S, Kato K. Identification of expressed genes linked to malignancy of human colorectal carcinoma by parameteric clustering of quantitative expression data. Genome Biol. 2003;4(R21):1–10.

3. Mirus JE, Zhang Y, Li CI, Lokshin AE, Prentice RL, Hingorani SR, Lampe PD. Cross-species antibody microarray interrogation identifies a 3-protein panel of plasma biomarkers for early diagnosis of pancreas cancer. Clin Cancer Res. 2015;21(7):1764–71.

4. Wang W, Iyer NG, Tay HT, Wu Y, Lim TK, Zheng L, Song IC, Kwoh CK, Huynh H, Tan PO. Microarray profiling shows distinct differences between primary tumors and commonly used preclinical models in hepatocellular carcinoma. BMC Cancer. 2015;15:828.

5. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR. Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8(1):68–74.

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimised multiple data partitions for cluster-wise imputation of missing values in gene expression data;Expert Systems with Applications;2024-12

2. NNAWA: A Granular Nearest Neighbor Imputation Technique Based on Alpha-Weighted Average;2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE);2024-06-30

3. Integrative Analysis of Genomic Data Types and AI Methodologies in Healthcare Applications;2024 2nd International Conference on Cyber Resilience (ICCR);2024-02-26

4. Summarising multiple clustering-centric estimates with OWA operators for improved KNN imputation on microarray data;Fuzzy Sets and Systems;2023-12

5. Ensemble Technique for Imputing Missing Values in MAR Missingness;2023 6th International Conference on Contemporary Computing and Informatics (IC3I);2023-09-14