Comparisons of genome assembly tools for characterization of Mycobacterium tuberculosis genomes using hybrid sequencing technologies

Author:

Trisakul Kanwara12,Hinwan Yothin12,Eisiri Jukgarin2,Salao Kanin12,Chaiprasert Angkana3,Kamolwat Phalin4,Tongsima Sissades5ORCID,Campino Susana6,Phelan Jody6,Clark Taane G.67,Faksri Kiatichai12

Affiliation:

1. Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand

2. Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand

3. Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand

4. Division of Tuberculosis, Department of Disease Control, Ministry of Public Health, Bangkok, Thailand

5. National Biobank of Thailand, National Center for Genetics Engineering and Biotechnology, Pathum Thani, Thailand

6. Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, University of London, London, United Kingdom

7. Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, University of London, London, United Kingdom

Abstract

Background Next-generation sequencing of Mycobacterium tuberculosis, the infectious agent causing tuberculosis, is improving the understanding of genomic diversity of circulating lineages and strain-types, and informing knowledge of drug resistance mutations. An increasingly popular approach to characterizing M. tuberculosis genomes (size: 4.4 Mbp) and variants (e.g., single nucleotide polymorphisms (SNPs)) involves the de novo assembly of sequence data. Methods We compared the performance of genome assembly tools (Unicycler, RagOut, and RagTag) on sequence data from nine drug resistant M. tuberculosis isolates (multi-drug (MDR) n = 1; pre-extensively-drug (pre-XDR) n = 8) generated using Illumina HiSeq, Oxford Nanopore Technology (ONT) PromethION, and PacBio platforms. Results Our investigation found that Unicycler-based assemblies had significantly higher genome completeness (~98.7%; p values = 0.01) compared to other assembler tools (RagOut = 98.6%, and RagTag = 98.6%). The genome assembly sizes (bp) across isolates and sequencers based on RagOut was significantly longer (p values < 0.001) (4,418,574 ± 8,824 bp) than Unicycler and RagTag assemblies (Unicycler = 4,377,642 ± 55,257 bp, and RagTag = 4,380,711 ± 51,164 bp). RagOut-based assemblies had the fewest contigs (~32) and the longest genome size (4,418,574 bp; vs. H37Rv reference size 4,411,532 bp) and therefore were chosen for downstream analysis. Pan-genome analysis of Illumina and PacBio hybrid assemblies revealed the greatest number of detected genes (4,639 genes; H37Rv reference contains 3,976 genes), while Illumina and ONT hybrid assemblies produced the highest number of SNPs. The number of genes from hybrid assemblies with ONT and PacBio long-reads (mean: 4,620 genes) was greater than short-read assembly alone (4,478 genes). All nine RagOut hybrid genome assemblies detected known mutations in genes associated with MDR-TB and pre-XDR-TB. Conclusions Unicycler software performed the best in terms of achieving contiguous genomes, whereas RagOut improved the quality of Unicycler’s genome assemblies by providing a longer genome size. Overall, our approach has demonstrated that short-read and long-read hybrid assembly can provide a more complete genome assembly than short-read assembly alone by detecting pan-genomes and more genes, including IS6110, and SNPs.

Funder

National Research Council of Thailand

Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand

UKRI MRC

EPSRC

Publisher

PeerJ

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3