Author:
Cao Yaqi,Ma Weidong,Zhao Ge,McCarthy Anne Marie,Chen Jinbo
Abstract
AbstractThe added value of candidate predictors for risk modeling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk by the two models are both unbiased in the target population. Very often data for candidate predictors are sourced from nonrepresentative convenience samples. Updating the base model using the study data without acknowledging the discrepancy between the underlying distribution of the study data and that in the target population can lead to biased risk estimates and therefore an unfair evaluation of candidate predictors. To address this issue assuming access to a well-calibrated base model, we propose a semiparametric method for model fitting that enforces good calibration. The central idea is to calibrate the fitted model against the base model by enforcing suitable constraints in maximizing the likelihood function. This approach enables unbiased assessment of model improvement offered by candidate predictors without requiring a representative sample from the target population, thus overcoming a significant practical challenge. We study theoretical properties for model parameter estimates, and demonstrate improvement in model calibration via extensive simulation studies. Finally, we apply the proposed method to data extracted from Penn Medicine Biobank to inform the added value of breast density for breast cancer risk assessment in the Caucasian woman population.
Funder
NHLBI Division of Intramural Research
Division of Cancer Epidemiology and Genetics, National Cancer Institute
Publisher
Springer Science and Business Media LLC
Reference25 articles.
1. Ankerst G, Gail M, Chatterjee N, Pfeiffer R (2016) Comparison of approaches for incorporating new information into existing risk prediction models. Stat Med 36(7):1134–56
2. Bondy M, Lustbader E, Halabi S, Ross E, Vogel V (1994) Validation of a breast cancer risk assessment model in women with a positive family history. J Natl Cancer Inst 86:620–5
3. Boyd S, Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
4. Chatterjee N, Chen Y, Maas P, Carroll R (2016) Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. J Am Stat Assoc 111:107–17
5. Costantino J, Gail M, Pee D, Anderson S, Redmond C, Benichou J, Wieand H (1999) Validation studies for models projecting the risk of invasive and total breast cancer incidence. J Natl Cancer Inst 91:1541–8