Application of Imputation Method for Compositional Data with Missing Values based on Adaptive LASSO Model: the Composition of Employment Industry in Taiyuan, China
-
Published:2023-12-04
Issue:6
Volume:19
Page:1082-1098
-
ISSN:2289-599X
-
Container-title:Malaysian Journal of Fundamental and Applied Sciences
-
language:
-
Short-container-title:Mal. J. Fund. Appl. Sci.
Author:
Tian Ying,Majahar Ali Majid Khan,Pei Shan Fam,Wu Lili,Mohd Jamaludin Siti Zulaikha
Abstract
The tripartite industry classification, which divides all economic activities into three parts, is a classification method to reflect the dynamic process of economic development and the historical trend of the change of resource allocation structure.The fact shows that the proportion of each industry has become an important symbol of the level of national economic development. The proportion of each industry is compositional data,which is a kind of complex multidimensional data used in many fields. All components in the compositional data are non-negative and carry only relative information. In practice, there could be missing values in compositional data. However, general statistical analysis methods cannot be firstly used for compositional data with missing values. The complexity of the missing value of compositional data makes traditional imputation methods no longer suitable. Thus, how to carry out effective statistical inference for compositional data with missing values attracts the attention of many scholars, recently. In this paper, we focus on the imputation problem in compositional data containing missing values, and propose an Adaptive Least Absolute Shrinkage and Selection Operator (ALASSO) imputation method to obtain a complete datasets through variable selection and parameter estimation. Then, the new method is simulated and empirically analyzed, and a comparative study with mean imputation, k-nearest neighbor imputation, and iterative regression imputation is conducted. The results show that the ALASSO imputation method has the highest accuracy for different missing rates, dimensions and correlation coefficients.
Publisher
Penerbit UTM Press
Subject
General Physics and Astronomy,General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Mathematics,General Chemistry