Abstract
AbstractChronic Obstructive Pulmonary Disease (COPD) is a complex, heterogeneous disease. Traditional subtyping methods generally focus on either the clinical manifestations or the molecular endotypes of the disease, resulting in classifications that do not fully capture the disease’s complexity. Here, we bridge this gap by introducing a subtyping pipeline that integrates clinical and gene expression data with variational autoencoders. We apply this methodology to the COPDGene study, a large study of current and former smoking individuals with and without COPD. Our approach generates a set of vector embeddings, called Personalized Integrated Profiles (PIPs), that recapitulate the joint clinical and molecular state of the subjects in the study. Prediction experiments show that the PIPs have a predictive accuracy comparable to or better than other embedding approaches. Using trajectory learning approaches, we analyze the main trajectories of variation in the PIP space and identify five well-separated subtypes with distinct clinical phenotypes, expression signatures, and disease outcomes. Notably, these subtypes are more robust to data resampling compared to those identified using traditional clustering approaches. Overall, our findings provide new avenues to establish fine-grained associations between the clinical characteristics, molecular processes, and disease outcomes of COPD.
Publisher
Cold Spring Harbor Laboratory
Reference87 articles.
1. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010
2. Multilevel, dynamic chronic obstructive pulmonary disease heterogeneity. a challenge for personalized medicine;Annals of the American Thoracic Society,2016
3. Machine learning characterization of copd subtypes: insights from the copdgene study;Chest,2020
4. Subtyping: What it is and its role in precision medicine;IEEE Intelligent Systems,2015
5. Distinct copd subtypes in former smokers revealed by gene network perturbation analysis;Respiratory Research,2023