Abstract
Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user-defined parameter to predict the optimal partition number and augmenting this parameter can drastically impact resultant partition memberships. Here, we demonstrate how to optimize a given tree dissection parameter to maximize accuracy irrespective of the tree dissection method used. We hierarchically clustered 1,873 genotypes of the foodborne pathogen Cyclospora spp., including 587 possessing links to historic outbreaks. We dissected the resulting tree using a statistical method requiring users to select the value of a ‘stringency parameter’ (s), with a recommended value of 95% to 99.5%. We dissected this hierarchical tree across s-values from 94% to 99.5% (at increments of 0.25%), to identify a value that maximized partitioning accuracy, defined as the degree to which genetic partitions conform to known epidemiologic groupings. We show that s-values of 96.5% and 96.75% yield the highest accuracy (> 99.9%) when clustering Cyclospora sp. isolates with known epidemiologic linkages. In practice, the optimized s-value will generate robust genetic partitions comprising isolates likely derived from a common food source, even when the epidemiologic grouping is not known prior to genetic clustering. While the s-value is specific to the tree dissection method used here, the optimization approach described could be applied to any parameter/method used to dissect hierarchical trees.
Publisher
Public Library of Science (PLoS)
Reference23 articles.
1. Beyond the SNP Threshold: Identifying Outbreak Clusters Using Inferred Transmissions;J Stimson;Mol Biol Evol,2019
2. Rapid Open-Source SNP-Based Clustering Offers an Alternative to Core Genome MLST for Outbreak Tracing in a Hospital Setting;J Szarvas;Front Microbiol,2021
3. Nonparametric Binary Classification to Distinguish Closely Related versus Unrelated P. falciparum Parasites;MM Plucinski;Am J Trop Med Hyg,2021
4. Epidemiologic utility of a framework for partition number selection when dissecting hierarchically clustered genetic data evaluated on the intestinal parasite Cyclospora cayetanensis;JLN Barratt;American Journal of Epidemiology,2022
5. Cyclospora cayetanensis comprises at least 3 species that cause human cyclosporiasis;JLN Barratt;Parasitology,2022
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献