Author:
Laspisa Daniel J.,Schneider Kevin L.,Presting Gernot G.
Abstract
AbstractGenome assemblies based on long read sequencing technology have revolutionized the assembly of repeat-rich centromere regions. However, because maize centromeres are highly enriched for the tandem repeat CentC and centromeric retrotransposons (CR), automated genome assembly left gaps even in the excellent B73 RefGen_v4 reference genome constructed from long-read data. Manual editing of >140 Mb spanning the ten centromeres of maize inbred B73 resulted in the closure of 127 sequence gaps and the addition of >8.4 Mb of previously unanchored sequence (unitigs and reads) containing 24 genes, 2 Mb of CR repeat and 887 kb of CentC without including any additional sequence data. The functional centromeres of five maize chromosomes were closed completely, including a 7 Mb region spanning the extremely CR2-rich CEN2. This improved assembly, B73 RefGen_v4CEN, was completed in February 2019 and has been available at https://doi.org/10.25739/7y1p-5169, both as pseudomolecules and as centromere assemblies alone. Thus, the manual editing of existing sequence data significantly improved the centromere regions of the B73 RefGen_v4 reference genome. These data were used for centromere analyses until the release of RefGen_v5.
Publisher
Cold Spring Harbor Laboratory