A Machine Learning Algorithm for the Detection of Paroxysmal Nocturnal Haemoglobinuria (PNH) in UK Primary Care Electronic Health Records

Author:

Worker Amanda1,Mahon Hadley1,Sams Jack1,Boardman-Pretty Freya1,Marchini Elena1,Dubis Rand1,Warren Alan1,Stockdale Jez1,Kumar Jyothika1,Varones Elizabeth1,Ollerenshaw Daniel1,Grant Calum1,Fish Peter1,Kelly Richard J2

Affiliation:

1. Mendelian

2. St. James’s University Hospital

Abstract

Abstract

Background Paroxysmal Nocturnal Haemoglobinuria (PNH) is an ultra-rare, acquired disorder that is challenging to diagnose due to varied symptoms, heterogeneous patient presentations, and lack of awareness among healthcare professionals. This leads to frequent misdiagnosis and delays in diagnosis. This study evaluated the feasibility of a machine learning model to identify undiagnosed PNH patients using structured electronic health records. Methods The study used data from the Optimum Patient Care Research Database, which contains electronic health records from general practitioner (GP) practices across the United Kingdom. PNH patients were identified by the presence, and control patients by the absence of a PNH diagnosis code in their records. Clinical features (symptoms, diagnoses, healthcare utilisation) from 131 patients in the PNH group and 593,838 patients in the control group, were inputted to a tree-based XGBoost machine learning model to classify patients as either “positive” or “negative” for PNH suspicion. The algorithm was finalised after additional exclusions and inclusions applied. Performance was assessed using positive predictive value (PPV), recall and specificity. As the sample used to develop the algorithm was not representative of the true population prevalence, PPV was additionally adjusted to reflect performance in the wider population. Results Of all the patients in the PNH group, 27% were classified as positive (recall). 99.99% of the control group were classified as negative (specificity). Of all the patients classified as positive, 60.4% had a diagnosis of PNH in their record (PPV). The PPV adjusted for the population prevalence of PNH was 19.59 suggesting nearly 1 in 5 patients flagged may warrant further PNH investigation. The key clinical features in the model were aplastic anaemia, pancytopenia, haemolytic anaemia, myelodysplastic syndrome, and Budd-Chiari syndrome. Conclusion This is the first study to combine clinical understanding of PNH with machine learning, demonstrating the ability to discriminate between PNH and control patients in retrospective electronic health records. With further investigation and validation, this algorithm could be deployed on live health data, potentially leading to earlier diagnosis for patients who currently experience long diagnostic delays or remain undiagnosed.

Publisher

Springer Science and Business Media LLC

Reference11 articles.

1. The incidence and prevalence of patients with paroxysmal nocturnal haemoglobinuria and aplastic anaemia PNH syndrome: A retrospective analysis of the UK’s population-based haematological malignancy research network 2004‐2018;Richards SJ;Eur J Haematol,2021

2. Long-term treatment with eculizumab in paroxysmal nocturnal hemoglobinuria: sustained efficacy and improved survival;Kelly RJ;Blood J Am Soc Hematol,2011

3. Treatment Outcomes of Complement Protein C5 Inhibition in 509 UK Patients with Paroxysmal Nocturnal Hemoglobinuria;Kelly RJ;Blood J

4. Primary care patient records in the United Kingdom: past, present, and future research priorities;McMillan B;J Med Internet Res,2018

5. Paroxysmal nocturnal hemoglobinuria: natural history of disease subcategories;Latour RP;Blood J Am Soc Hematol,2008

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3