A hybrid and poly-polish workflow for the complete and accurate assembly of phage genomes: a case study of ten przondoviruses

Author:

Elek Claire K. A.ORCID,Brown Teagan L.ORCID,Viet Thanh LeORCID,Evans RhiannonORCID,Baker David J.ORCID,Telatin AndreaORCID,Tiwari Sumeet K.ORCID,Al-Khanaq HaiderORCID,Thilliez GaëtanORCID,Kingsley Robert A.ORCID,Hall Lindsay J.ORCID,Webber Mark A.ORCID,Adriaenssens Evelien M.ORCID

Abstract

ABSTRACTBacteriophages (phages) within thePrzondovirusgenus are T7-like podoviruses belonging to theStudiervirinaesubfamily, within theAutographiviridaefamily and have a highly conserved genome organisation. The genome size of these phages ranges from 37 kb to 42 kb, encode 50-60 genes and are characterised by the presence of direct terminal repeats (DTRs) flanking the linear chromosome. These DTRs are often deleted during short-read-only and hybrid assemblies. Moreover, long-read-only assemblies are often littered with sequencing and/or assembly errors and require additional curation. Here, we present the isolation and characterisation of ten novel przondoviruses targetingKlebsiellaspp. We describe HYPPA – aHYbrid andPoly-polishPhageAssembly workflow, which utilises long-read assemblies in combination with short-read sequencing to resolve phage DTRs and correcting errors, negating the need for laborious primer walking and Sanger sequencing validation. Our data demonstrate the importance of careful curation of phage assemblies before publication, and prior to using them for comparative genomics.IMPACT STATEMENTThe current workflows employed for phage genome assembly are often error-prone and can lead to many incomplete phage genomes being deposited within databases. This can create challenges when performing comparative genomics, and may also lead to incorrect taxonomic assignment. To overcome these challenges we proposed HYPPA, a workflow that can produce complete and high-quality phage genomes without the need for laborious lab-based validation.DATA SUMMARYPhage raw reads are available from the National Centre for Biotechnology Information Sequence Read Archive (NCBI-SRA) under the BioProject number PRJNA914245. Phage annotated genomes have been deposited at GenBank under the accessionsOQ579023-OQ579032(Table 1). Bacterial WGS data for clinical preterm infant samples have been deposited at GenBank under BioProject accession PRJNA471164 (Table S1). Bacterial raw reads for food samples are available from NCBI-SRA with individual accessions (SAMN33593347-SAMN33593351), and can be found under the BioProject number PRJNA941224 (Table S1). Strain-specific details for bacteria and publicly-available phages used in these analyses, along with accessions for the latter can be found inTable S1andTable S6, respectively. The CL1-CL8 clinicalKlebsiellastrains (Table S1) were under a Materials Transfer Agreement, for which sequencing data and strain information is not available.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3