Distributed Based Serial Regression Multiple Imputation for High Dimensional Multivariate Data in Multicore Environment of Cloud

Author:

Lavanya K. 1,Reddy L.S.S.2,Reddy B. Eswara3

Affiliation:

1. Research Scholar, Department of Computer Science & Engineering, JNTUA College of Engineering, Anantapur, India

2. Professor, Department of Computer Science & Engineering, KL University, Guntur, India

3. Professor, Department of Computer Science & Engineering, JNTUA College of Engineering, Anantapur, India

Abstract

Multiple imputations (MI) are predominantly applied in such processes that are involved in the transaction of huge chunks of missing data. Multivariate data that follow traditional statistical models undergoes great suffering for the inadequate availability of pertinent data. The field of distributed computing research faces the biggest hurdle in the form of insufficient high dimensional multivariate data. It mainly deals with the analysis of parallel input problems found in the cloud computing network in general and evaluation of high-performance computing in particular. In fact, it is a tough task to utilize parallel multiple input methods for accomplishing remarkable performance as well as allowing huge datasets achieves scale. In this regard, it is essential that a credible data system is developed and a decomposition strategy is used to partition workload in the entire process for minimum data dependence. Subsequently, a moderate synchronization and/or meager communication liability is followed for placing parallel impute methods for achieving scale as well as more processes. The present article proposes many novel applications for better efficiency. As the first step, this article suggests distributed-oriented serial regression multiple imputation for enhancing the efficiency of imputation task in high dimensional multivariate normal data. As the next step, the processes done in three diverse with parallel back ends viz. Multiple imputation that used the socket method to serve serial regression and the Fork Method to distribute work over workers, and also same work experiments in dynamic structure with a load balance mechanism. In the end, the set of distributed MI methods are used to experimentally analyze amplitude of imputation scores spanning across three probable scenarios in the range of 1:500. Further, the study makes an important observation that due to the efficiency of numerous imputation methods, the data is arranged proportionately in a missing range of 10% to 50%, low to high, while dealing with data between 1000 and 100,000 samples. The experiments are done in a cloud environment and demonstrate that it is possible to generate a decent speed by lessening the repetitive communication between processors.

Publisher

IGI Global

Subject

Software

Reference31 articles.

1. A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome

2. An opportunity cost approach for job assignment in a scalable computing cluster

3. Multiple imputation by chained equations: what is it and how does it work?

4. Brand, J. P. (1999). Development, Implementation and Evaluation of Multiple Imputation Strategies for the Statistical Analysis of Incomplete Data Sets [doctoral dissertation]. Erasmus University, Rotterdam, The Netherlands.

5. Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud.;F.Bu;The Journal of Supercomputing,2016

Cited by 21 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Development of a Statistical Model for Automated Ground Truth Generation in Low-Resource Languages;SN Computer Science;2024-04-20

2. An Ensemble Method for Heterogeneous Data Classification using Boosted k-NN with Active Learning;2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT);2024-01-11

3. EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data;International Conference on Innovative Computing and Communications;2023-10-26

4. AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE Model for Classification of Alzheimer’s Disease;International Conference on Innovative Computing and Communications;2023-10-26

5. Gene expression data classification with robust sparse logistic regression using fused regularisation;International Journal of Ad Hoc and Ubiquitous Computing;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3