Leveraging deep-learning on raw spirograms to improve genetic understanding and risk scoring of COPD despite noisy labels-Reference-Cited by-同舟云学术

Leveraging deep-learning on raw spirograms to improve genetic understanding and risk scoring of COPD despite noisy labels

Published:2022-09-15 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Cosentino Justin^ORCID,Behsaz Babak,Alipanahi Babak,McCaw Zachary R.,Hill Davin,Schwantes-An Tae-Hwi,Lai Dongbing,Carroll Andrew,Hobbs Brian D.,Cho Michael H.,McLean Cory Y.,Hormozdiari Farhad

Abstract

AbstractChronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify new genetic signals. Here we train a deep convolutional neural network on noisy self-reported and ICD-based labels to predict COPD case/control status from high-dimensional raw spirograms and use the model predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls (AUROC = 0.82 ± 0.01) and COPD-related hospitalization (AUROC = 0.89 ± 0.01) without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival (Hazard ratio = 1.22 ± 0.01; P ≤ 2 × 10−16) and exacerbation events (R2 = 0.10 ± 0.01; P ≤ 4 × 10−101). A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci, but also identifies 67 new loci. Thirty-eight of these have supportive evidence in independent datasets, including a locus near LTBR. We demonstrate the biological plausibility of the novel variants through enrichment analyses, phenome-wide association studies, and generalizability of COPD prediction in multiple datasets. These results provide an example of the potential to improve genetic discovery of disease-relevant variants by training deep neural networks to predict noisy labels from high-dimensional raw data.

Publisher

Cold Spring Harbor Laboratory

Reference68 articles.

1. Pathology, pathogenesis, and pathophysiology

2. World Health Organization. Global health estimates: Life expectancy and leading causes of death and disability, 2019. URL https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates.

3. Edwin Silverman , Scott Weiss , Steven Shapiro , and David Lomas . Respiratory genetics. CRC Press, 2005.

4. Siblings of Patients With Severe Chronic Obstructive Pulmonary Disease Have a Significant Risk of Airflow Obstruction

5. Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Empirical Study of ML-based Phenotyping and Denoising for Improved Genomic Discovery;2022-11-18