Regression without regrets – initial data analysis is an essential prerequisite to multivariable regression-Reference-Cited by-同舟云学术

Regression without regrets – initial data analysis is an essential prerequisite to multivariable regression

Published:2023-11-14 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Heinze Georg¹,Baillie Mark²,Lusa Lara³,Sauerbrei Willi⁴,Schmidt Carsten Oliver⁵,Harrell Frank E.⁶,Huebner Marianne⁷

Affiliation:

1. Medical University of Vienna

2. Novartis (Switzerland)

3. University of Primorska

4. University of Freiburg

5. University Medicine of Greifswald

6. Vanderbilt University

7. Michigan State University

Abstract

Abstract Statistical regression models are used for predicting outcomes based on the values of some predictor variables or for describing the association of an outcome with predictors. With a data set at hand, a regression model can be easily fit with standard software packages. This bears the risk that data analysts may rush to perform sophisticated analyses without sufficient knowledge of basic properties, associations in and errors of their data, leading to wrong interpretation and often questionable presentation of the modeling results. Ignorance about special features of the data such as redundancies or particular distributions may even invalidate the chosen analysis strategy. The main aim of initial data analysis (IDA) in the context of regression analyses is seen in providing knowledge about the data to confirm the appropriateness of or to refine a chosen model building strategy, to interpret the modeling results correctly, and to guide the presentation of modeling results. In order to facilitate reproducibility, IDA needs to be preplanned, an IDA plan should be included in the general statistical analysis plan of a research project, and results should be well documented. Biased statistical inference of the final regression model can be minimized if IDA abstains from evaluating associations of outcome and predictors, a key principle of IDA. We give advice on which aspects to consider in an IDA plan for data screening in the context of regression modeling to supplement the statistical analysis plan. We illustrate this IDA plan for data screening in an example of a typical diagnostic modeling project and give recommendations for data visualizations.

Publisher

Research Square Platform LLC

Reference32 articles.

1. Vach V. Regression Models as a Tool in Medical Research. Chapman and Hall/CRC. Boca Raton; 2013.

2. Harrell F Jr. Regression Modelling Strategies, 2nd Edition. Springer. New York, NJ; 2015.

3. Heinze G for TG2 of the STRATOS initiative, State of the art in selection of variables and functional forms in multivariable analysis—outstanding issues;Sauerbrei W;Diagn Progn Res

4. Regression using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling;Royston P;JRSS C (Applied Statistics),1994

5. A contemporary conceptual framework for initial data analysis;Huebner M;Obs Stud,2018

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating variable selection methods for multivariable regression models: A simulation study protocol;PLOS ONE;2024-08-09

2. Initial data analysis for longitudinal studies to build a solid foundation for reproducible analysis;PLOS ONE;2024-05-29

3. Initial data analysis for longitudinal studies to build a solid foundation for reproducible analysis;2023-12-06