Author:
Chin Chen-Shan,Peluso Paul,Sedlazeck Fritz J.,Nattestad Maria,Concepcion Gregory T.,Clum Alicia,Dunn Christopher,O'Malley Ronan,Figueroa-Balderas Rosa,Morales-Cruz Abraham,Cramer Grant R.,Delledonne Massimo,Luo Chongyuan,Ecker Joseph R.,Cantu Dario,Rank David R.,Schatz Michael C.
Abstract
AbstractWhile genome assembly projects have been successful in a number of haploid or inbred species, one of the current main challenges is assembling non-inbred or rearranged heterozygous genomes. To address this critical need, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble Single Molecule Real-Time (SMRT®) Sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We demonstrate the quality of this approach by assembling new reference sequences for three heterozygous samples, including an F1 hybrid of the model species Arabidopsis thaliana, the widely cultivated V. vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata that have challenged short-read assembly approaches. The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches. The phased diploid assembly enabled the study of haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.
Publisher
Cold Spring Harbor Laboratory
Reference51 articles.
1. Life with 6000 Genes
2. The Genome Sequence of
Drosophila melanogaster
3. A Whole-Genome Assembly of
Drosophila
4. A new DNA sequence assembly program
5. Stamatoyannopoulos, J.A. , Guigo Serra, R., Djebali, S. , Lagarde, J. & Adams, L.B. An encyclopedia of mouse DNA elements (Mouse ENCODE). (2012).