ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data-Reference-Cited by-同舟云学术

ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data

Published:2022-09-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Mixão Verónica¹^ORCID,Pinto Miguel¹^ORCID,Sobral Daniel¹,Pasquale Adriano Di²,Gomes João Paulo¹^ORCID,Borges Vitor¹^ORCID

Affiliation:

1. Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Dr. Ricardo Jorge, Av. Padre Cruz, 1600-609 Lisbon, Portugal

2. National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “Giuseppe Caporale” (IZSAM), Teramo, Italy

Abstract

Abstract Background Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases’ prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time consuming and difficult to reproduce. Results We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance thresholds (e.g., high resolution thresholds used for outbreak detection or stable threshold ranges for nomenclature design) and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography or vaccination/clinical status. By handling several input formats (SNP/allele matrices, trees/dendrograms, multiple sequence alignments, VCF files or distance matrices) and clustering methods, ReporTree is applicable to multiple pathogens, thus constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a benchmarking using core genome- (cg) or whole genome- (wg) Multiple Locus Sequence Type (MLST) (cg/wgMLST) datasets of four foodborne bacterial pathogens (each comprising more than a thousand isolates), in which genetic clusters at possible outbreak level were identified and reported in a matter of seconds. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata (e.g, antibiotic resistance data). By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. Conclusions In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree or as a Docker image at insapathogenomics/reportree.

Funder

Horizon 2020 Framework Programme

Publisher

Research Square Platform LLC

Reference38 articles.

1. 1. Jolley KA, Maiden MCJ. Using multilocus sequence typing to study bacterial variation: prospects in the genomic era. Future Microbiol. 2014;9:623–30.

2. 2. Wohl S, Schaffner SF, Sabeti PC. Genomic Analysis of Viral Outbreaks. Annu Rev Virol. 2016;3:173–95.

3. 3. Ribeiro-Gonçalves B, Francisco AP, Vaz C, Ramirez M, Carriço JA. PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees. Nucleic Acids Res. 2016;44:W246–51.

4. 4. Zhou Z, Alikhan N-F, Sergeant MJ, Luhmann N, Vaz C, Francisco AP, et al. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res. 2018;28:1395–404.

5. 5. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34:4121–3.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Genome mining reveals the prevalence and extensive diversity of toxin–antitoxin systems in Staphylococcus aureus;Frontiers in Microbiology;2023-05-24

2. Listeria monocytogenes, Escherichia coli and Coagulase Positive Staphylococci in Cured Raw Milk Cheese from Alentejo Region, Portugal;Microorganisms;2023-01-27

3. Pathogenic Escherichia coli, Salmonella spp. and Campylobacter spp. in Two Natural Conservation Centers of Wildlife in Portugal: Genotypic and Phenotypic Characterization;Microorganisms;2022-10-27