Weighted Maximum Likelihood Correlation Coefficient to Handle Missing Values and Outliers in Data Set
Author:
Sinsomboonthong Juthaphorn1, Sinsomboonthong Saichon2
Affiliation:
1. Department of Statistics, Faculty of Science Kasetsart University Bangkok, 10900, THAILAND 2. Department of Statistics, School of Science King Mongkut’s Institute of Technology Ladkrabang Bangkok, 10520, THAILAND
Abstract
The proposed estimator, namely weighted maximum likelihood (WML) correlation coefficient, for measuring the relationship between two variables to concern about missing values and outliers in the dataset is presented. This estimator is proven by applying the conditional probability function to take care of some missing values and pay more attention to values near the center. However, outliers in the dataset are assigned a slight weight. These using techniques will give the robust proposed method when the preliminary assumptions are not met data analysis. To inspect about the quality of the proposed estimator, the six methods—WML, Pearson, median, percentage bend, biweight mid, and composite correlation coefficients—are compared the properties in two criteria, i.e. the bias and mean squared error, via the simulation study. The results of generated data are illustrated that the WML estimator seems to have the best performance to withstand the missing values and outliers in dataset, especially for the tiny sample size and large percentage of outliers regardless of missing data levels. However, for the massive sample size, the median correlation coefficient seems to have the good estimator when linear relationship levels between two variables are approximately over 0.4 irrespective of outliers and missing data levels
Publisher
World Scientific and Engineering Academy and Society (WSEAS)
Subject
General Mathematics
Reference32 articles.
1. Kutner, M.H., Nachtsheim C.J., Neter, J., Li, W., Applied Linear Statistical Models, ed. 5, Irwin, 2005. 2. Cheng, Y.T., Yang, C.C., An approach of stocks substitution strategy using fuzzy interval correlation coefficient, Communications in Statistics – Simulation and Computation, Vol.45, No. 4, 2016, pp. 1187–1196. 3. Little, R.J.A., Rubin, D.B., Statistical Analysis with Missing Data, ed. 3, John Wiley & Son, 2019. 4. Rao, C.R., Toutenburg, H., Fieger, A., Linear Models and Generalizations: Least Squares and Alternatives, ed. 3, Springer Verlag, 2007. 5. Acock, A.C., Working with missing values, Journal of Marriage and Family, Vol.67, 2005, pp. 1012–1028.
|
|