Author:
Zeng Qingdong,Cao Wenjin,Xing Liping,Qin Guowei,Wu Jianhui,Nagle Michael F.,Xiong Qin,Chen Jinhui,Yang Liming,Bajaj Prasad,Chitikineni Annapurna,Zhou Yan,Yu Yunxin,Xu Jiang,Nie Xiaojun,Huang Lin,Liu Shengjie,Šafář Jan,Šimková Hana,Song Weining,Guo Baozhu,Chen Shilin,Doležel Jaroslav,Hao Zhaodong,Cheng Qiang,Liang Jianguo,Tang Jiansong,Cao Aizhong,Wang Qiang,Lu Xiangqian,Yang Shouping,Ma Hongxiang,Liu Jiajie,Wang Xiaoting,Zhang Hong,Wang Zhonghua,Ji Wanquan,Wang Changfa,Yuan Fengping,Shi Jisen,Varshney Rajeev K.,Kang Zhensheng,Han Dejun,Xu Haibin
Abstract
AbstractAcross domains of biological research using genome sequence data, high-quality reference genome sequences are essential for characterizing genetic variation and understanding the genetic basis of phenotypes. However, the construction of genome assemblies for various species is often hampered by complexities of genome organization, especially repetitive and complex sequences, leading to mis-assembly and missing regions. Here, we describe a high-throughput gold standard genome assembly workflow using a large-scale bacterial artificial chromosome (BAC) library with a refined two-step pooling strategy and the Lamp assembler algorithm. This strategy minimizes the laborious processes of physical map construction and clone-by-clone sequencing, enabling inexpensive sequencing of several thousand BAC clones. By applying this strategy with a minimum tiling path BAC clone library for the short arm of chromosome 2D (2DS) of bread wheat, 98% of BAC sequences, covering 92.7% of the 2DS chromosome, were assembled correctly for this species with a highly complex and repetitive genome. We also identified 48 large mis-assemblies in the reference wheat genome assembly (IWGSC RefSeq v1.0) and corrected these large mis-assemblies in addition to filling 92.2% of the gaps in RefSeq v1.0. Our 2DS assembly represents a new benchmark for the assembly of complex genomes with both high accuracy and efficiency.
Publisher
Cold Spring Harbor Laboratory