A Supervised Statistical Learning Approach for Accurate Legionella pneumophila Source Attribution during Outbreaks

Author:

Buultjens Andrew H.12,Chua Kyra Y. L.3,Baines Sarah L.1,Kwong Jason123,Gao Wei1,Cutcher Zoe45,Adcock Stuart4,Ballard Susan3,Schultz Mark B.3,Tomita Takehiro3,Subasinghe Nela3,Carter Glen P.12,Pidot Sacha J.1,Franklin Lucinda4,Seemann Torsten36,Gonçalves Da Silva Anders23,Howden Benjamin P.123,Stinear Timothy P.12

Affiliation:

1. Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Parkville, Victoria, Australia

2. Doherty Applied Microbial Genomics, The Peter Doherty Institute for Infection and Immunity, Parkville, Victoria, Australia

3. Microbiological Diagnostic Unit Public Health Laboratory at the Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Parkville, Victoria, Australia

4. Health Protection Branch, Department of Health and Human Services, Melbourne, Victoria, Australia

5. National Centre for Epidemiology and Population Health, Australian National University, Canberra, ACT, Australia

6. Victorian Life Sciences Computational Initiative, The University of Melbourne, Carlton, Victoria, Australia

Abstract

ABSTRACT Public health agencies are increasingly relying on genomics during Legionnaires' disease investigations. However, the causative bacterium ( Legionella pneumophila ) has an unusual population structure, with extreme temporal and spatial genome sequence conservation. Furthermore, Legionnaires' disease outbreaks can be caused by multiple L. pneumophila genotypes in a single source. These factors can confound cluster identification using standard phylogenomic methods. Here, we show that a statistical learning approach based on L. pneumophila core genome single nucleotide polymorphism (SNP) comparisons eliminates ambiguity for defining outbreak clusters and accurately predicts exposure sources for clinical cases. We illustrate the performance of our method by genome comparisons of 234 L. pneumophila isolates obtained from patients and cooling towers in Melbourne, Australia, between 1994 and 2014. This collection included one of the largest reported Legionnaires' disease outbreaks, which involved 125 cases at an aquarium. Using only sequence data from L. pneumophila cooling tower isolates and including all core genome variation, we built a multivariate model using discriminant analysis of principal components (DAPC) to find cooling tower-specific genomic signatures and then used it to predict the origin of clinical isolates. Model assignments were 93% congruent with epidemiological data, including the aquarium Legionnaires' disease outbreak and three other unrelated outbreak investigations. We applied the same approach to a recently described investigation of Legionnaires' disease within a UK hospital and observed a model predictive ability of 86%. We have developed a promising means to breach L. pneumophila genetic diversity extremes and provide objective source attribution data for outbreak investigations. IMPORTANCE Microbial outbreak investigations are moving to a paradigm where whole-genome sequencing and phylogenetic trees are used to support epidemiological investigations. It is critical that outbreak source predictions are accurate, particularly for pathogens, like Legionella pneumophila , which can spread widely and rapidly via cooling system aerosols, causing Legionnaires' disease. Here, by studying hundreds of Legionella pneumophila genomes collected over 21 years around a major Australian city, we uncovered limitations with the phylogenetic approach that could lead to a misidentification of outbreak sources. We implement instead a statistical learning technique that eliminates the ambiguity of inferring disease transmission from phylogenies. Our approach takes geolocation information and core genome variation from environmental L. pneumophila isolates to build statistical models that predict with high confidence the environmental source of clinical L. pneumophila during disease outbreaks. We show the versatility of the technique by applying it to unrelated Legionnaires' disease outbreaks in Australia and the UK.

Funder

Department of Health | National Health and Medical Research Council

Publisher

American Society for Microbiology

Subject

Ecology,Applied Microbiology and Biotechnology,Food Science,Biotechnology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3