An efficient ensemble method for missing value imputation in microarray gene expression data

Author:

Zhu Xinshan,Wang Jiayu,Sun Biao,Ren Chao,Yang Ting,Ding Jie

Abstract

Abstract Background The genomics data analysis has been widely used to study disease genes and drug targets. However, the existence of missing values in genomics datasets poses a significant problem, which severely hinders the use of genomics data. Current imputation methods based on a single learner often explores less known genomic data information for imputation and thus causes the imputation performance loss. Results In this study, multiple single imputation methods are combined into an imputation method by ensemble learning. In the ensemble method, the bootstrap sampling is applied for predictions of missing values by each component method, and these predictions are weighted and summed to produce the final prediction. The optimal weights are learned from known gene data in the sense of minimizing a cost function about the imputation error. And the expression of the optimal weights is derived in closed form. Additionally, the performance of the ensemble method is analytically investigated, in terms of the sum of squared regression errors. The proposed method is simulated on several typical genomic datasets and compared with the state-of-the-art imputation methods at different noise levels, sample sizes and data missing rates. Experimental results show that the proposed method achieves the improved imputation performance in terms of the imputation accuracy, robustness and generalization. Conclusion The ensemble method possesses the superior imputation performance since it can make use of known data information more efficiently for missing data imputation by integrating diverse imputation methods and learning the integration weights in a data-driven way.

Funder

National Natural Science Foundation of China

Opening Project of State Key Laboratory of Digital Publishing Technology

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Cited by 17 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Optimised multiple data partitions for cluster-wise imputation of missing values in gene expression data;Expert Systems with Applications;2024-12

2. NNAWA: A Granular Nearest Neighbor Imputation Technique Based on Alpha-Weighted Average;2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE);2024-06-30

3. Integrative Analysis of Genomic Data Types and AI Methodologies in Healthcare Applications;2024 2nd International Conference on Cyber Resilience (ICCR);2024-02-26

4. Summarising multiple clustering-centric estimates with OWA operators for improved KNN imputation on microarray data;Fuzzy Sets and Systems;2023-12

5. Ensemble Technique for Imputing Missing Values in MAR Missingness;2023 6th International Conference on Contemporary Computing and Informatics (IC3I);2023-09-14

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3