Interpretable detection of novel human viruses from genome sequencing data

Author:

Bartoszewicz Jakub M1234ORCID,Seidel Anja12ORCID,Renard Bernhard Y134ORCID

Affiliation:

1. Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany

2. Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany

3. Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany

4. Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany

Abstract

Abstract Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.

Funder

German Academic Scholarship Foundation

Federal Ministry of Education and Research

BMBF

Publisher

Oxford University Press (OUP)

Subject

General Medicine

Reference85 articles.

1. Clock rooting further demonstrates that Guinea 2014 EBOV is a member of the Zaïre lineage;Calvignac-Spencer;PLoS Curr.,2014

2. Emerging bacterial pathogens: the past and beyond;Vouga;Clin. Microbiol. Infec.,2016

3. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries;Trappe;Bioinformatics,2016

4. Assessing the evidence supporting fruit bats as the primary reservoirs for ebola viruses;Leendertz;EcoHealth,2016

5. The diagnosis of infectious diseases by whole genome next generation sequencing: a new era is opening;Lecuit;Front. Cell. Infect. Mi.,2014

Cited by 39 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3