Abstract
AbstractIdentifying individuals with tuberculosis with a high risk of onward transmission can guide disease prevention and public health strategies. Here, we train classification models to predict the first sampled isolates inMycobacterium tuberculosistransmission clusters from demographic and disease data. We find that supervised learning models, in particular balanced random forests, can be used to develop predictive models that discriminate between individuals with TB that are more likely to form transmission clusters and individuals that are likely not to transmit further, with good model performance and AUCs of ≥ 0.75. We also identified the most important patient and disease characteristics in the best performing classification model, including patient demographics, site of infection, TB lineage, and age at diagnosis. This framework can be used to develop predictive tools for the early assessment of a patient’s transmission risk to prioritise individuals for enhanced follow-up with the aim of reducing further transmission.
Publisher
Cold Spring Harbor Laboratory