Author:
Barker Dillon OR,Carriço João A,Kruczkiewicz Peter,Palma Federica,Rossi Mirko,Taboada Eduardo N
Abstract
AbstractWhole-genome sequencing (WGS) of microbial pathogens has become an essential part of modern epidemiological investigations. Although WGS data can be analyzed using a number of different approaches, such as traditional phylogenetic methods, a critical requirement for global systems for pathogen surveillance is the development of approaches for transforming sequence data into WGS-based subtypes, which creates a nomenclature that describes their higher-order relationships to one another. To this end, subtype similarity thresholds are needed to define clusters of subtypes representing lineages of interest. WGS-based subtyping presents a challenge since both the addition of novel genome sequences and small adjustments in similarity thresholds can have a dramatic impact on cluster composition and stability. We present the Neighbourhood Adjusted Wallace Coefficient (nAWC), a method for evaluating cluster stability based on computing cluster concordance between neighbouring similarity thresholds. The nAWC can be used to identify areas in in which distance thresholds produce robust clusters. Using datasets fromSalmonella entericaandCampylobacter jejuni, representing strongly and weakly clonal bacterial species respectively, we show that clusters generated using such thresholds are both stable and reflect basic units in their overall population structure. Our results suggest that the nAWC could be useful for defining robust clusters compatible with nomenclatures for global WGS-based surveillance networks, which require stable clusters to be defined that both harness the discriminatory power of WGS data while allowing for long-term tracking of strains of interest.
Publisher
Cold Spring Harbor Laboratory
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献