Abstract
AbstractBacterial pathogenicity has traditionally focused on gene-level content with experimentally-confirmed functional properties. Hence, significant inferences are made based on similarity to known pathotypes and DNA-based genomic subtyping for risk. Herein, we achievedde novoprediction of human virulence inKlebsiella pneumoniaeby expanding known virulence genes with spatially proximal gene discoveries linked by functional domain architectures across all prokaryotes. This approach identified gene ontology functions not typically associated with virulencesensu stricto. By leveraging machine learning models with these expanded discoveries, public genomes were assessed for virulence prediction using categorizations derived from isolation sources captured in available metadata. Performance forde novostrain-level virulence prediction achieved 0.81 F1-Score. Virulence predictions using expanded “discovered” functional genetic content were superior to that restricted to extant virulence database content. Additionally, this approach highlighted the incongruence in relying on traditional phylogenetic subtyping for categorical inferences. Our approach represents an improved deconstruction of genome-scale datasets for functional predictions and risk assessment intended to advance public health surveillance of emerging pathogens.
Publisher
Cold Spring Harbor Laboratory