CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes-Reference-Cited by-同舟云学术

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes

Published:2020-05-01 Issue:5 Volume:9 Page:
ISSN:2047-217X
Container-title:GigaScience
language:en
Short-container-title:

Author:

Kuhl Heiner¹^ORCID,Li Ling¹²,Wuertz Sven¹,Stöck Matthias¹^ORCID,Liang Xu-Fang²,Klopp Christophe³^ORCID

Affiliation:

1. Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany

2. College of Fisheries, Chinese Perch Research Center, Huazhong Agricultural University; Innovation Base for Chinese Perch Breeding, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, No.1 Shizishan Street, Hongshan District, 430070 Wuhan, Hubei Province, P.R. China

3. Sigenae, Bioinfo Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRAe, 24 Chemin de Borde Rouge, 31320 Auzeville-Tolosane, Castanet Tolosan, France

Abstract

Abstract Background Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. Result Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. Conclusions CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.

Funder

German Research Foundation

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Health Informatics

Link

http://academic.oup.com/gigascience/article-pdf/9/5/giaa034/33293165/giaa034.pdf

Reference71 articles.

1. Long-read sequence assembly of the gorilla genome;Gordon;Science,2016

2. Chromosomal-level assembly of the Asian seabass genome using long sequence reads and multi-layered scaffolding;Vij;PLoS Genet,2016

3. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads;Korlach;Gigascience,2017