Affiliation:
1. School of Natural Sciences (SNS), National University of Sciences and Technology (NUST), Islamabad, Pakistan
Abstract
Article presents the algorithm which models the categorical multicollinear data by providing the balance in model accuracy on test data and number of selected features in the model. In all scientific fields, multicollinear data is being generated, where obviously some variables are noise and some are influential reference to response variable. Features and response appeared to be categorical in mathematical and statistical modeling of public health data. These datasets usually appeared to collinear, where partial least squares (PLS) is the potential method, which is not feature selection at its default level and deals with quantitative features. Recently, categorical PLS (Cat-PLS) is introduced. We have implemented the regularized feature selection in Cat-PLS where filter-based feature selection and categorical mean through Cramer’s V, Phi coefficient, Tschuprow’s T coefficient, Contingency Coefficient, and Yule’s Q and Yule’s Y are used. Monte carlo simulation with 100 runs indicates
is the better choice in terms of better model performance, number of feature selection, and interpretations for modeling the stillbirths, which is taken as the case study. The framework can be used in related areas to explore and model the related data structures.
Subject
General Engineering,General Mathematics
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献