Affiliation:
1. University of the Punjab
Abstract
In multiple regression, different techniques are available to deal with the situation where the predictors are large in number, and multicollinearity exists among them. Some of these approaches rely on correlation and others depend on principal components. To cope with the influential observations (outliers, leverage, or both) in the data matrix for regression purposes, two techniques are proposed in this paper. These are Robust Correlation Based Regression (RCBR) and Robust Correlation Scaled Principal Component Regression (RCSPCR). These proposed methods are compared with the existing methods, i.e., traditional Principal Component Regression (PCR), Correlation Scaled Principal Component Regression (CSPCR), and Correlation Based Regression (CBR). Also, Macro (Missingness and cellwise and row-wise outliers) RCSPCR is proposed to cope with the problem of multicollinearity, the high dimensionality of the dataset, outliers, and missing observations simultaneously. The proposed techniques are assessed by considering several simulated scenarios with appropriate levels of contamination. The results indicate that the suggested techniques seem to be more reliable for analyzing the data with missingness and outlyingness. Additionally, real-life data applications are also used to illustrate the performance of the proposed methods.
Subject
Geometry and Topology,Statistics and Probability,Algebra and Number Theory,Analysis
Reference79 articles.
1. [1] H. Abdi and L.J. Williams, Principal component analysis, Wiley Interdiscip. Rev. Comput. Stat. 2 (4), 433-459, 2010.
2. [2] C. Agostinelli, A. Leung, V.J. Yohai and R.H. Zamar, Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination, Test 24 (3), 441-461, 2015.
3. [3] M.H. Ahmad, R. Adnan and N. Adnan, A comparative study on some methods for handling multicollinearity problems, Matematika (Johor) 22 (2), 109-119, 2006.
4. [4] A. Alfons, Package “robustHD: Robust methods for high-dimensional data”, R package version: 0.5.1, 2016.
5. [5] A. Alin, Multicollinearity, Wiley Interdiscip. Rev. Comput. Stat. 2 (3), 370-374, 2010.