Abstract
ABSTRACTSputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNAseq) can be used on sputum, but it can be challenging to interpret because sputum contains a complex and heterogeneous mixture of human cells and exogenous (microbial) material. In this study, we developed a methodology that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. We use this to relate bulk RNAseq data from 115 asthmatic patients with clinical information, microscope images, and single-cell profiles. First, we mapped sputum RNAseq to human and exogenous sources. Next, we decomposed the human reads into cell-expression signatures and fractions of these in each sample; we validated the decomposition using targeted single-cell RNAseq and microscopy. We observed enrichment of immune-system cells (neutrophils, eosinophils, and mast cells) in severe asthmatics. Second, we inferred microbial abundances from the exogenous reads and then associated these with clinical variables -- e.g.,Haemophiluswas associated with increased white blood cell count andCandida,with worse lung function. Third, we applied a generative model, Latent Dirichlet allocation (LDA), to identify patterns of gene expression and microbial abundances and relate them to clinical data. Based on this, we developed a method called LDA-link that connects microbes to genes using reduced-dimensionality LDA topics. We found a number of known connections, e.g. betweenHaemophilusand the gene IL1B, which is highly expressed by mast cells. In addition, we identified novel connections, includingCandidaand the calcium-signaling gene CACNA1E, which is highly expressed by eosinophils. These results speak to the mechanism by which gene-microbe interactions contribute to asthma and define a strategy for making inferences in heterogeneous and noisy RNAseq datasets.
Publisher
Cold Spring Harbor Laboratory