Abstract
2.AbstractHaemophilus influenzaeis part of the human nasopharyngeal microbiota and a pathogen causing invasive disease. The extensive genetic diversity observed inH. influenzaenecessitates discriminatory analytical approaches to evaluate its population structure. This study developed a core genome MLST (cgMLST) scheme forH. influenzaeusing pangenome analysis tools and validated the cgMLST scheme using datasets consisting of complete reference genomes (N=14) and high-quality draftH. influenzaegenomes (N=2,297). The draft genome dataset was divided into a development (N=921) and a validation dataset (N=1,376). The development dataset was used to identify potential core genes with the validation dataset used to refine the final core gene list to ensure the reliability of the proposed cgMLST scheme. Functional classifications were made for all resulting core genes. Phylogenetic analyses were performed using both allelic profiles and nucleotide sequence alignments of the core genome to test congruence, as assessed by Spearman’s correlation and Ordinary Least Square linear regression tests. Preliminary analyses using the development dataset identified 1,067 core genes, which were refined to 1,037 with the validation dataset. More than 70% of core genes were predicted to encode proteins essential for metabolism or genetic information processing. Phylogenetic and statistical analyses indicated that the core genome allelic profile accurately represented phylogenetic relatedness among the isolates (R2= 0.945). We used this cgMLST scheme to define a high-resolution population structure forH. influenzae, which enhances the genomic analysis of this clinically relevant human pathogen.3.Impact statementDiscriminatingH. influenzaevariants and evaluating population structure has been challenging and largely unstandardised. To address this, we have developed a cgMLST scheme forH. influenzae.Since an accurate typing approach relies on precise reflection of the underlying population structure, we explored various methods to define the scheme. The core genes included in this scheme were predicted to encode functions in essential biological pathways, such as metabolism and genetic information processing, and could be reliably assembled from short-read sequence data. Single-linkage clustering, based on core genome allelic profiles, showed high congruence to genealogy reconstructed by Maximum-Likelihood (ML) methods from the core genome nucleotide alignment. The cgMLST scheme v1 enables rapid and accurate depiction of high-resolutionH. influenzaepopulation structure, and making this scheme accessible via the PubMLST database, ensures that microbiology reference laboratories and public health authorities worldwide can use it for genomic surveillance.4.Data summaryTheH. influenzaecgMLST scheme is accessible viahttps://pubmlst.org/organisms/haemophilus-influenzae. The list of isolate IDs available publicly frompubmlst.orgis provided in Supplementary File 1. The pipeline for cgMLST scheme development and validation is published athttps://www.protocols.io/private/EF6DB7FE429311EEB8630A58A9FEAC02. All in-house R and Python scripts for data processing and analysis are available fromhttps://gitfront.io/r/user-4399403/ZHt8DArALHcY/cgmlst-hinf/.
Publisher
Cold Spring Harbor Laboratory
Reference77 articles.
1. Carrol KC , Funke G , Landry ML , Richter SS , Warnock DW. Manual of Clinical Microbiology. Washington, DC: ASM Press; 2019.
2. Brooks GF , Jawetz E , Melnick JL , Adelberg EA. Jawetz, Melnick, & Adelberg’s medical microbiology . New York: McGraw Hill Medical; 2019.
3. Current Epidemiology and Trends in Invasive Haemophilus influenzae Disease—United States, 2009–2015
4. A nationwide population-based surveillance of invasive Haemophilus influenzae diseases in children after the introduction of the Haemophilus influenzae type b vaccine in Japan
5. Bertran M , D’Aeth JC , Hani E , Amin-Chowdhury Z , Fry NK et al. Trends in invasive Haemophilus influenzae serotype a disease in England from 2008-09 to 2021-22: a prospective national surveillance study. Lancet Infect Dis 2023.