Deep learning-based real-time detection of novel pathogens during sequencing

Author:

Bartoszewicz Jakub M1,Genske Ulrich1,Renard Bernhard Y1

Affiliation:

1. Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany

Abstract

Abstract Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.

Funder

German Academic Scholarship Foundation

BMBF-funded Computational Life Science initiative

German Network for Bioinformatics Infrastructure

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference53 articles.

1. A pneumonia outbreak associated with a new coronavirus of probable bat origin;Zhou;Nature,2020

2. Emerging bacterial pathogens: the past and beyond;Vouga;Clin Microbiol Infect,2016

3. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries;Trappe;Bioinformatics,2016

4. Epidemic profile of shiga-toxin-producing Escherichia coli o104:h4 outbreak in Germany;Frank;N Engl J Med,2011

5. Clock rooting further demonstrates that guinea 2014 ebov is a member of the zaïre lineage;Calvignac-Spencer;PLoS Curr,2014

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3