Abstract
ABSTRACTImportanceDiagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored.ObjectivePrimary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels.MethodsThis study included three cohorts: SickKidsPedsfrom The Hospital for Sick Children, and StanfordPedsand StanfordAdultsfrom Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen’s Kappa, sensitivity and specificity were calculated for each lab-based severity level.ResultsThe number of admissions included were: SickKidsPeds(n=59,298), StanfordPeds(n=24,639) and StanfordAdults(n=159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPedscompared to SickKidsPedsacross all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen’s Kappa and sensitivity were lower at SickKidsPedsfor all severity levels compared to StanfordPeds.ConclusionsAcross multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献