Author:
Jia Kai,Gu Bowen,Saowakon Pasapol,Kundrot Steven,Palchuk Matvey B.,Warnick Jeff,Kaplan Irving D.,Rinard Martin,Appelbaum Limor
Abstract
AbstractBackground and AimsHepatocellular Carcinoma (HCC) is often diagnosed late, limiting curative treatment options. Conversely, early detection in cirrhotic patients through screening offers high cure rates but is underutilized and misses cases occurring in individuals without cirrhosis. We aimed to build, validate, and simulate the deployment of models for HCC risk stratification using routinely collected Electronic Health Record (EHR) data from a geographically and racially diverse U.S. population.MethodsWe developed Logistic Regression (LiricLR) and Neural Network (LiricNN) models for the general (GP) and cirrhosis populations utilizing EHR data from 46,79 HCC cases and 1,128,202 controls aged 40-100 years. Data was sourced from 64 Health Care Organizations (HCOs) from a federated network, spanning academic medical centers, community hospitals, and outpatient clinics nationwide. We evaluated model performance using AUC, calibration plots, and Geometric Mean of Overestimation (GMOE), the geometric mean of ratios of predicted to actual risks. External validation involved HCO location, race, and temporal factors. Simulated deployment assessed sensitivity, specificity, Positive Predictive Value, Number Needed to Screen for each risk threshold.ResultsLiricLR and LiricNN (GP) achieved test set AUCs of AUC=0.8968 (95% CI: 0.8925, 0.9010) and AUC=0.9254 (95% CI: 0.9218, 0.9289), respectively, leveraging 46 established (cirrhosis, hepatitis, diabetes) and novel (frequency of clinical encounters, platelet, albumin, aminotransferase values) features. Average external validation AUCs of LiricNN were 0.9274 (95% CI: 0.9239, 0.9308) for locations and 0.9284 (95% CI: 0.9247, 0.9320) for races. Average GMOEs were 0.887 (95% CI: 0.862-0.911). Simulated model deployment of LiricNN provides performance metrics across multiple risk thresholds.ConclusionsLiricmodels utilize routine EHR data to accurately predict risk of HCC development. Their scalability, generalizability, and interpretability set the stage for future clinical deployment and the design of more effective screening programs.Lay SummaryHepatocellular Carcinoma (HCC), the most common liver cancer, is often diagnosed in late stages, limiting treatment options. Early detection through screening is essential for effective intervention and potential cure. However, current screening mostly targets patients with liver cirrhosis, many of whom do not get screened, while missing others who could develop HCC even without cirrhosis.To improve screening, we created and tested Liric(LIver cancer RIsk Computation) models. These models use routine medical records from across the country to identify people at high risk of developing HCC.Liricmodels have several benefits. Firstly, they can increase awareness among primary care physicians (PCPs) nationwide, improving the utilization of HCC screening. This is particularly crucial in areas with socio-demographic disparities, where access to specialist physicians may be limited. Additionally, Liricmodels can identify patients who would be missed by current screening guidelines, ensuring a more comprehensive approach to HCC detection.Liriccan be integrated into EHR systems to automatically generate a risk score from routinely collected patient data. This risk score can provide valuable information to physicians and caregivers, helping them make informed decisions about the need for HCC screening and can be used to develop cost-effective screening programs by identifying populations in which screening is effective.Graphical abstractHighlightsScreening detects HCC early but is underutilized and misses cases without cirrhosisWe developed, validated, and simulated deployment of Liricto identify individuals at high-risk for HCCLiricuses routinely collected clinical and lab data from a diverse US populationLiricaccurately predicts risk of HCC 6-36 months before it occursLiriccan assist PCPs in identifying individuals most in need of screeningImpacts and implicationsEffective screening for hepatocellular carcinoma (HCC) is vital to achieve early detection and improved cure rates. However, the existing screening approach primarily targets patients with liver cirrhosis, and is both underutilized and fails to identify those without underlying cirrhosis.Implementation of Liricmodels has the potential to enhance nationwide awareness among primary care physicians (PCPs), and improve screening utilization for hepatocellular carcinoma (HCC), particularly in regions characterized by socio-demographic disparities. Furthermore, these models can help identify patients who are currently overlooked by existing screening guidelines and aid in the development of new, more effective guidelines.Integration of Liricmodels into EHR systems via a federated network would enable automatic generation of risk scores using unfiltered patient data. This approach could more accurately identify at-risk patients, providing valuable information to caregivers for HCC screening.
Publisher
Cold Spring Harbor Laboratory