Abstract
AbstractUnified strain taxonomies are crucial for fostering international communication in microbiological research and for the epidemiological surveillance of bacterial pathogens. While multilocus sequence typing (MLST) has served as a foundation of strain taxonomy for two decades, whole genome sequencing enables more precise classifications and significantly improves discriminatory resolution. The core genome-wide extension of MLST (known as cgMLST) thus holds great promise for strain genotyping and classification, but its implementation faces challenges that include missing data, potential instability of cluster-based nomenclatures, and the necessity to ensure backwards compatibility with MLST identifiers. Life Identification Number (LIN) codes offer a solution by providing multi-level classification groups that are inherently stable. Here, we present, consolidate, and extend the cgMLST-based LIN code approach. We first develop a nicknaming system for LIN code prefixes, which enables flexible human-readable strain nomenclatures. UsingKlebsiella pneumoniae(Kp) as an example, LIN code nicknames were attributed by inheritance from MLST identifiers, thus perpetuating the legacy of MLST nomenclatures in the genomic era. We show that while 7-gene MLST sometimes conflates unrelated sublineages into the same ST, cgMLST-based LIN codes are highly concordant with phylogenetic relationships. We implement this novel LIN code-based nomenclature in the BIGSdb platform, and illustrate, with Pathogenwatch, how it can also be used in other genomic epidemiology platforms. Finally, we demonstrate the value of LIN codes for tracking the strain diversity within high-risk internationally disseminated clonal groups of Kp and protracted outbreaks. Given its stability, precision, and flexibility, we recommend the adoption of the cgMLST-based LIN code taxonomic approach for Kp and suggest that this approach is widely applicable to other bacterial pathogens.
Publisher
Cold Spring Harbor Laboratory