Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly

Author:

Zhou Tao12ORCID,Lu Liang12,Li Chenhong12ORCID

Affiliation:

1. Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution Shanghai Ocean University Shanghai China

2. Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding Shanghai Ocean University Shanghai China

Abstract

AbstractA combination of short‐insert paired‐ended and mate‐pair libraries of large insert sizes is used as a standard method to generate genome assemblies with high contiguity. The third‐generation sequencing techniques also are used to improve the quality of assembled genomes. However, both mate‐pair libraries and the third‐generation libraries require high‐molecular‐weight DNA, making the use of these libraries inappropriate for samples with only degraded DNA. An in silico method that generates mate‐pair libraries using a reference genome was devised for the task of assembling target genomes. Although the contiguity and completeness of assembled genomes were significantly improved by this method, a high level of errors manifested in the assembly, further to which the methods for using reference genomes, was not optimized. Here, we tested different strategies for using reference genomes to generate in silico mate‐pairs. The results showed that using a closely related reference genome from the same genus was more effective than using divergent references. Conservation of in silico mate‐pairs by comparing two references and using those to guide genome assembly reduced the number of misassemblies (18.6%–46.1%) and increased the contiguity of assembled genomes (9.7%–70.7%), while maintaining gene completeness at a level that was either similar or marginally lower than that obtained via the current method. Finally, we developed a pipeline of the optimized in silico method and compared it with another reference‐guided assembler, RagTag. We found that RagTag produced longer scaffolds (17.8 Mbp vs 3.0 Mbp), but resulted in a much higher misassembly rate (85.68%) than our optimized in silico mate‐pair method. This optimized in silico pipeline developed in this study should facilitate further studies on genomics, population genetics, and conservation of endangered species.

Funder

Science and Technology Commission of Shanghai Municipality

Publisher

Wiley

Subject

Nature and Landscape Conservation,Ecology,Ecology, Evolution, Behavior and Systematics

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3