Author:
van Aggelen Helen,Kolde Raivo,Chamarthi Hareesh,Loving Joshua,Fan Yu,Fallon John T.,Huang Weihua,Wang Guiqing,Fortunato-Habib Mary M.,Carmona Juan J.,Gross Brian D.
Abstract
AbstractWhole-genome sequencing is increasingly adopted in clinical settings to identify pathogen transmissions. Currently, such studies are performed largely retrospectively, but to be actionable they need to be carried out prospectively, in which samples are continuously added and compared to previous samples. To enable prospective pathogen comparison, genomic relatedness metrics based on single nucleotide differences must be consistent across time, efficient to compute and reliable for a large variety of samples. The choice of genomic regions to compare, i.e., the core genome, is critical to obtain a good metric.We propose a novel core genome method that selects conserved sequences in the reference genome by comparing its k-mer content to that of publicly available genome assemblies. The conserved-sequence genome is sample set-independent, which enables prospective pathogen monitoring. Based on clinical data sets of 3436 S. aureus, 1362 K. pneumoniae and 348 E. faecium samples, we show that the conserved-sequence genome disambiguates same-patient samples better than a core genome consisting of conserved genes. The conserved-sequence genome confirms outbreak samples with high accuracy: in a set of 2335 S. aureus samples, it correctly identifies 44 out of 45 outbreak samples, whereas the conserved gene method confirms 38 out of 45 outbreak samples.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献