Application of Principal Component Analysis in Dealing with Multicollinearity in Modelling Clinical Data

Author:

Mishra Akash,Nair N Sreekumaran,Harichandrakumar KT,Binu VS,Satheesh Santhosh

Abstract

Introduction: One of the stringent assumptions about covariates in the Cox hazard and Logistic regression modelling is that they should be independent. Incorporating correlated covariates as such into the model might distort the precision of the estimates due to multicollinearity. One way to deal with multicollinearity is by using Principal Component Analysis (PCA) technique. Aim: To demonstrate the application of PCA in dealing with correlated covariates while modelling time to event and casecontrol study data. Materials and Methods: This study was conducted at Jawaharlal Institute of Postgraduate Medical Education and Research, Puducherry, India, from February 2021 to January 2022. Two datasets were used for the demonstration i.e., data relates to a time to event outcome and a case-control study with binary outcome in which lipids were the correlated covariates. Three sets of Cox regression models were used to demonstrate change in hazard ratios with 95% Confidence Intervals (CI) for evaluating the effect of intervention at a different time of lipid measurement. Model I has evaluated treatment/ Body Mass Index (BMI) effect on the outcome by ignoring the effect of lipid parameters. Model II has evaluated treatment/ BMI effect on the outcome by incorporating lipid variables but ignoring multicollinearity. Model III has evaluated treatment/ BMI effect on the outcome by incorporating lipid variables through principal component analysis and thus adjusting for multicollinearity. Similarly, a logistic regression model was performed by using the same three sets of models to evaluate the effect of exposure (BMI). The comparability of lipids between the two groups for both datasets was tested using Hotelling’s T-squared statistic. Results: The lipids measured at 12th, 24th and 36th months between the two groups in the first data set as well as between cases and controls in the second data set were statistically significant. In the first dataset, at baseline, the Hazard Ratio’s (HR’s) were statistically similar irrespective of the models used; while decreasing successively with narrowing 95% CI’s as moving from model I to model III for the lipid measured at 12th, 24th and 36th months. Further, at 24th and 36th months, the HR in model-III found to be significant. In the second data set, the Odds Ratio (OR) were significant for all the three models and it was almost similar for model I and II but in model III it was elevated. Conclusion: The multicollinearity issue should be properly addressed before including correlated covariates in the Cox regression hazard and Logistic regression model. The PCA technique would be a favourable method.

Publisher

JCDR Research and Publications

Subject

Clinical Biochemistry,General Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3