Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’

Author:

Jacobson DavidORCID,Barratt Joel

Abstract

Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user-defined parameter to predict the optimal partition number and augmenting this parameter can drastically impact resultant partition memberships. Here, we demonstrate how to optimize a given tree dissection parameter to maximize accuracy irrespective of the tree dissection method used. We hierarchically clustered 1,873 genotypes of the foodborne pathogen Cyclospora spp., including 587 possessing links to historic outbreaks. We dissected the resulting tree using a statistical method requiring users to select the value of a ‘stringency parameter’ (s), with a recommended value of 95% to 99.5%. We dissected this hierarchical tree across s-values from 94% to 99.5% (at increments of 0.25%), to identify a value that maximized partitioning accuracy, defined as the degree to which genetic partitions conform to known epidemiologic groupings. We show that s-values of 96.5% and 96.75% yield the highest accuracy (> 99.9%) when clustering Cyclospora sp. isolates with known epidemiologic linkages. In practice, the optimized s-value will generate robust genetic partitions comprising isolates likely derived from a common food source, even when the epidemiologic grouping is not known prior to genetic clustering. While the s-value is specific to the tree dissection method used here, the optimization approach described could be applied to any parameter/method used to dissect hierarchical trees.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference23 articles.

1. Beyond the SNP Threshold: Identifying Outbreak Clusters Using Inferred Transmissions;J Stimson;Mol Biol Evol,2019

2. Rapid Open-Source SNP-Based Clustering Offers an Alternative to Core Genome MLST for Outbreak Tracing in a Hospital Setting;J Szarvas;Front Microbiol,2021

3. Nonparametric Binary Classification to Distinguish Closely Related versus Unrelated P. falciparum Parasites;MM Plucinski;Am J Trop Med Hyg,2021

4. Epidemiologic utility of a framework for partition number selection when dissecting hierarchically clustered genetic data evaluated on the intestinal parasite Cyclospora cayetanensis;JLN Barratt;American Journal of Epidemiology,2022

5. Cyclospora cayetanensis comprises at least 3 species that cause human cyclosporiasis;JLN Barratt;Parasitology,2022

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3