Author:
Núñez Rafael C.,Hart Gregory R.,Famulare Michael,Lorton Christopher,Herbeck Joshua T.
Abstract
AbstractSince the coining of the term phylodynamics, the use of phylogenies to understand infectious disease dynamics has steadily increased. As methods for phylodynamics and genomic epidemiology have proliferated and grown more computationally expensive, the epidemiological information they extract has also evolved to better complement what can be learned through traditional epidemiological data. However, for genomic epidemiology to continue to grow, and for the accumulating number of pathogen genetic sequences to fulfill their potential widespread utility, the extraction of epidemiological information from phylogenies needs to be simpler and more efficient. Summary statistics provide a straightforward way of extracting information from a phylogenetic tree, but the relationship between these statistics and epidemiological quantities needs to be better understood. In this work we address this need via simulation. Using two different benchmark scenarios, we evaluate 74 tree summary statistics and their relationship to epidemiological quantities. In addition to evaluating the epidemiological information that can be inferred from each summary statistic, we also assess the computational cost of each statistic. This helps us optimize the selection of summary statistics for specific applications. Our study offers guidelines on essential considerations for designing or choosing summary statistics. The evaluated set of summary statistics, along with additional helpful functions for phylogenetic analysis, is accessible through an open-source Python library. Our research not only illuminates the main characteristics of many tree summary statistics but also provides valuable computational tools for real-world epidemiological analyses. These contributions aim to enhance our understanding of disease spread dynamics and advance the broader utilization of genomic epidemiology in public health efforts.Author SummaryOur study focuses on the use of phylogenetic analysis to get valuable epidemiological insights. We conducted a simulation study to evaluate 74 phylogenetic summary statistics and their relationship to epidemiological quantities, shedding light on the potential of each of these statistics to quantify different characteristics of disease spread dynamics. Additionally, we assessed the computational cost of each statistic. This gives us additional information when selecting a statistic for a particular application. Our research is available through an open-source Python library. This work helps us enhance our understanding of phylogenetic tree structures and contributes to the broader application of genomic epidemiology in public health initiatives.
Publisher
Cold Spring Harbor Laboratory
Reference23 articles.
1. The evolution of HIV: Inferences using phylogenetics
2. The Role of Phylogenetics in Unravelling Patterns of HIV Transmission towards Epidemic Control: The Quebec Experience (2002–2020)
3. Bedford T , Riley S , Barr IG , Broor S , Chadha M , Cox NJ , et al. Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature. 2015 Jul;523(7559):217–20.
4. Bayesian phylogeography of influenza A/H3N2 for the 2014-15 season in the United States using three frameworks of ancestral state reconstruction;PLOS Comput Biol,2017
5. Gire SK , Goba A , Andersen KG , Sealfon RSG , Park DJ , Kanneh L , et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science [Internet]. 2014 Sep 12 [cited 2024 Jul 18]; Available from: https://www.science.org/doi/10.1126/science.1259657