Large-scale deep learning analysis for the early diagnosis of primary immunodeficiencies

Author:

Papanastasiou Giorgos1,Yang Guang2,Fotiadis Dimitris3,Dikaios Nikolaos4,Wang Chengjia5,Huda Ahsan1,Sobolevsky Luba6,Sidhu Gurinder1,Palumbo Donna1

Affiliation:

1. Pfizer Inc, New York, NY, USA

2. Imperial College London, UK

3. Department of Biomedical Research, Institute of Molecular Biology and Biotechnology, FORTH, Ioannina, Greece

4. Mathematics Research Center, Academy of Athens, Athens, Greece

5. Heriot Watt University, UK

6. Immunoglobulin National Society, Woodland Hills, CA, USA

Abstract

Abstract Primary immunodeficiency (PID) is a group of heterogeneous disorders resulting from immune system defects. The early PID diagnosis is compromised by the heterogeneous manifestations along with low clinical awareness. Most PID cases are significantly underdiagnosed leading to increased mortality, co-morbidities and healthcare visits and costs. Among PID, combined immunodeficiencies (CID) are characterized by complex immune defects. Common variable immunodeficiency (CVID) is among the most common types of PID. In light of available treatments for CID and CVID, it is critical to systematize their early diagnosis. Our study objectives were two-fold. First, we developed and evaluated an accurate deep learning model to analyze administrative medical claims data from EHRs towards systematizing screening and early identification of CID and CVID. Second, we revealed the most important CID- and CVID-associated clinical phenotypes and their combinations, demonstrating a systematic methodology to improve early identification of these PID. All data were composed of medical claims derived from the Optum® de-identified electronic health record (EHR) database. Four large cohorts were generated: 797, 797, 2,312, and 19,924 CID/CVID cases and equal control sizes in Cohorts 1–4, respectively (a total of 47,660 cases and controls). Two deep learning models were developed (TabMLPNet and TabResNet) and compared against baseline models. Univariate logistic regression was used to calculate odds ratios across all clinical phenotypes and their combinations. The TabMLPNet model showed the highest diagnostic performance across cohorts with sensitivity, specificity, and overall accuracy ranging from 0.82–0.88, 0.82–0.85, and 0.80–0.87, respectively. For the first time, we identified distinctive combinations of antecedent phenotypes associated with CID/CVID per cohort, being consisted of respiratory infections/conditions, genetic anomalies, cardiac defects, autoimmune diseases, blood disorders and malignancies. Most phenotypes emerged were well described in the literature, which validated our findings. Moreover, several less well documented individual phenotypes (i.e., asthma, coagulation defects complicating pregnancy, cancer of lymphoid histiocytic tissue, lymphoid leukemia chronic) were also identified, which can lead to better clinical surveillance of PID. We demonstrated a generalized and accurate method evaluated on a large EHR-derived cohort of CID/CVID cases and controls. Our methodology can lead to the development of new clinical guidelines and pathways for earlier identification of the most important antecedent phenotypes and their combinations, enhance clinical awareness and be used to improve PID diagnosis and outcomes on a population level.

Publisher

Research Square Platform LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3