Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding

Author:

McKernan Kevin Judd,Peckham Heather E.,Costa Gina L.,McLaughlin Stephen F.,Fu Yutao,Tsung Eric F.,Clouser Christopher R.,Duncan Cisyla,Ichikawa Jeffrey K.,Lee Clarence C.,Zhang Zheng,Ranade Swati S.,Dimalanta Eileen T.,Hyland Fiona C.,Sokolsky Tanya D.,Zhang Lei,Sheridan Andrew,Fu Haoning,Hendrickson Cynthia L.,Li Bin,Kotler Lev,Stuart Jeremy R.,Malek Joel A.,Manning Jonathan M.,Antipova Alena A.,Perez Damon S.,Moore Michael P.,Hayashibara Kathleen C.,Lyons Michael R.,Beaudoin Robert E.,Coleman Brittany E.,Laptewicz Michael W.,Sannicandro Adam E.,Rhodes Michael D.,Gottimukkala Rajesh K.,Yang Shan,Bafna Vineet,Bashir Ali,MacBride Andrew,Alkan Can,Kidd Jeffrey M.,Eichler Evan E.,Reese Martin G.,De La Vega Francisco M.,Blanchard Alan P.

Abstract

We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ∼18× haploid coverage of aligned sequence and close to 300× clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Cited by 419 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3