Abstract
AbstractDetecting factors associated with transmission is important to understand disease epidemics, and to design effective public health measures. Clustering and terminal branch lengths (TBL) analyses are commonly applied to genomic data sets of Mycobacterium tuberculosis (MTB) to identify sub-populations with increased transmission. Here, I used a simulation-based approach to investigate what epidemiological processes influence the results of clustering and TBL analyses, and whether difference in transmission can be detected with these methods. I simulated MTB epidemics with different dynamics (latency, infectious period, transmission rate, basic reproductive number R0, sampling proportion, and molecular clock), and found that all these factors, except the length of the infectious period and R0, affect the results of clustering and TBL distributions. I show that standard interpretations of this type of analyses ignore two main caveats: 1) clustering results and TBL depend on many factors that have nothing to do with transmission, 2) clustering results and TBL do not tell anything about whether the epidemic is stable, growing, or shrinking. An important consequence is that the optimal SNP threshold for clustering depends on the epidemiological conditions, and that sub-populations with different epidemiological characteristics should not be analyzed with the same threshold. Finally, these results suggest that different clustering rates and TBL distributions, that are found consistently between different MTB lineages, are probably due to intrinsic bacterial factors, and do not indicate necessarily differences in transmission or evolutionary success.
Publisher
Cold Spring Harbor Laboratory