Affiliation:
1. School of Biological Sciences, The University of Queensland, St Lucia, Queensland 4072, Australia
2. Department of Evolution and Ecology, University of California, Davis, California 95616
Abstract
Abstract
Long-read sequencing technology promises to greatly enhance de novo assembly of genomes for nonmodel species. Although the error rates of long reads have been a stumbling block, sequencing at high coverage permits the self-correction of many errors. Here, we sequence and de novo assemble the genome of Drosophila serrata, a species from the montium subgroup that has been well-studied for latitudinal clines, sexual selection, and gene expression, but which lacks a reference genome. Using 11 PacBio single-molecule real-time (SMRT cells), we generated 12 Gbp of raw sequence data comprising ∼65 × whole-genome coverage. Read lengths averaged 8940 bp (NRead50 12,200) with the longest read at 53 kbp. We self-corrected reads using the PBDagCon algorithm and assembled the genome using the MHAP algorithm within the PBcR assembler. Total genome length was 198 Mbp with an N50 just under 1 Mbp. Contigs displayed a high degree of chromosome arm-level conservation with the D. melanogaster genome and many could be sensibly placed on the D. serrata physical map. We also provide an initial annotation for this genome using in silico gene predictions that were supported by RNA-seq data.
Publisher
Oxford University Press (OUP)
Subject
Genetics (clinical),Genetics,Molecular Biology
Reference83 articles.
1. Limitations of next-generation genome sequence assembly.;Alkan;Nat. Methods,2011
2. Gene expression during the life cycle of Drosophila melanogaster.;Arbeitman;Science,2002
3. FlyBase: establishing a gene group resource for Drosophila melanogaster.;Attrill;Nucleic Acids Res.,2016
4. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.;Berlin;Nat. Biotechnol.,2015
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献