Data Farming to Table: Combined Use of a Learning Health System Infrastructure, Statistical Profiling, and Artificial Intelligence for Automating Toxicity and 3-year Survival for Quantified Predictive Feature Discovery from Real-World Data for Patients Having Head and Neck Cancers

Author:

Mayo Charles S,Su Shiqin,Rosen Benjamin,Covington Elizabeth,Zhang Zheng,Lawrence Theodore,Kudner Randi,Fuller CliftonORCID,Brock Kristy K,Shah Jennifer,Mierzwa Michelle M

Abstract

IntroductionClinicians iteratively adjust treatment approaches to improve outcomes but to date, automatable approaches for continuous learning of risk factors as these adjustments are made are lacking. We combined a large-scale comprehensive real-world Learning Health System infrastructure (LHSI), with automated statistical profiling, visualization, and artificial intelligence (AI) approach to test evidence-based discovery of clinical factors for three use cases: dysphagia, xerostomia, and 3-year survival for head and neck cancer patients. Our hypothesis was that the combination would enable automated discovery of prognostic features generating testable insights.MethodsRecords for 964 patients treated at a single instiution for head and neck cancers with conventional fractionation between 2017 and 2022 were used. Combined information on demographics, diagnosis and staging, social determinants of health measures, chemotherapy, radiation therapy dose volume histogram curves, and treatment details, laboratory values, and outcomes from the LHSI to winnow evidence for 485 candidate prognostic features. Univariate statistical profiling using benchmark resampling to detail confidence intervals for thresholds and metrics: area under the curve (AUC), sensitivity (SN), specificity (SP), F1, diagnostic odds ratio (DOR), p values for Wilcoxon Rank Sum (WRS), Kolmogorov-Smirnov (KS), and logistic fits of distributions detailed predictive evidence of individual features. Statistical profiling was used to benchmark, parsimonious XGBoost models were constructed with 10-fold cross validation using training (70%), validation (10%), and test (20%) sets. Probabilistic models utilizing statistical profiling logistic fits of distributions were used to benchmark XGBoost models.ResultsAutomated standardized analysis identified novel features and clinical thresholds. Validity of automated findings were affirmed with supporting literature benchmarks. Average incidence of dysphagia ≥grade 3 within 1 year of treatment was low (11%). Xerostomia ≥ grade 2 (39% to 16%) and survival ≤ 3 years decreased (25% to 15%) over the time range. Standard planning constraints used limited contribution of those features:: Musc_Constrict_S: Mean[Gy] < 50, Glnd_Submand_High: Mean[Gy] ≤ 30, Glnd_Submand_Low: Mean[Gy] ≤ 10, Parotid_High: Mean[Gy] ≤ 24, Parotid_Low: Mean[Gy] ≤ 10 Additional prognostic features identified for dysphagia included Glnd_Submand_High:D1%[Gy] ≥ 71.1, Glnd_Submand_Low:D4%[Gy] ≥ 55.1, Musc_Constric_S:D10%[Gy] ≥ 56.5, GTV_Low:Mean[Gy] ≥ 71.3. Strongest grade 2 xerostomia feature was Glnd_Submand_Low: D15%[Gy] ≥ 45.2 with a logistic model quantifying a gradual rather than an abrupt increase in probability 13.5 + 0.18 (x-41.0 Gy). Strongest prognostic factors for lower likelihood of death by 3 years were GTV_High: Volume[cc] ≤ 21.1, GTV_Low: Volume[cc] ≤ 57.5, Baseline Neutrophil-Lymphocyte Ratio (NLR) ≤ 5.6, Monocyte-Lymphocyte Ratio (MLR) ≤0.56, Platelet-Lymphocyte ratio (PLR) ≤ 202.5. All predictors had WRS and KS p values < 0.02. Statistical profiling enabled detailing gains of XGBoost models with respect to individual features. Time period reductions in distribution of GTV volumes correlated with reductions in death by 3 years.DiscussionConfirming our hypothesis, automated, standardized statistical profiling of a set of statistical metrics and visualizations supported detailing predictive strength and confidence intervals of individual features, benchmarking of subsequent AI models, and clinical assessment. Association of high dose values to submandibular gland volumes, highlighted relevance as surrogate measures for proximal un-contoured muscles including digastric muscles. Higher values of PLR, NLR, and MLR were associated with lower survival rates. Combined use of Learning Health System Infrastructure, Statistical Profiling and Artificial Intelligence provided a basis for faster, more efficient evidence-based continuous learning of risk factors and development of clinical trial testable hypothesis. Benchmarking AI models with simple probabilistic models provided a means of understanding when results are driven by general areas of overall risk vs. more complex interactions.

Publisher

Cold Spring Harbor Laboratory

Reference35 articles.

1. The big data effort in radiation oncology: Data mining or data farming?;Adv Radiat Oncol,2016

2. Machine Learning Model of Emergency Department Use for Patients Undergoing Treatment for Head and Neck Cancer Using Comprehensive Multifactor Electronic Health Records;JCO Clin Cancer Inform. Wolters Kluwer,2023

3. Mayo CS , Matuszak MM , Schipper MJ , Jolly S , Hayman JA , Ten Haken RK . Big Data in Designing Clinical Trials: Opportunities and Challenges. Front Oncol. 2017;7:187. PMCID: PMC5583160

4. American Association of Physicists in Medicine Task Group 263: Standardizing Nomenclatures in Radiation Oncology;Int J Radiat Oncol Biol Phys,2018

5. Operational Ontology for Oncology (O3) - A Professional Society Based, Multi-Stakeholder, Consensus Driven Informatics Standard Supporting Clinical and Research use of “Real - World” Data from Patients Treated for Cancer: Operational Ontology for Radiation Oncology;Int J Radiat Oncol Biol Phys,2023

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Global Workforce and Access: Demand, Education, Quality;Seminars in Radiation Oncology;2024-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3