Author:
Barratt Joel L N,Plucinski Mateusz M
Abstract
Abstract
Comparing parasite genotypes to inform parasitic disease outbreak investigations involves computation of genetic distances that are typically analyzed by hierarchical clustering to identify related isolates, indicating a common source. A limitation of hierarchical clustering is that hierarchical clusters are not discrete; they are nested. Consequently, small groups of similar isolates exist within larger groups that get progressively larger as relationships become increasingly distant. Investigators must dissect hierarchical trees at a partition number ensuring grouped isolates belong to the same strain; a process typically performed subjectively, introducing bias into resultant groupings. We describe an unbiased, probabilistic framework for partition number selection that ensures partitions comprise isolates that are statistically likely to belong to the same strain. We computed distances and established a normalized distribution of background distances that we used to demarcate a threshold below which the closeness of relationships is unlikely to be random. Distances are hierarchically clustered and the dendrogram dissected at a partition number where most within-partition distances fall below the threshold. We evaluated this framework by partitioning 1,137 clustered Cyclospora cayetanensis genotypes, including 552 isolates epidemiologically linked to various outbreaks. The framework was 91% sensitive and 100% specific in assigning epidemiologically linked isolates to the same partition.
Publisher
Oxford University Press (OUP)
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献