Increasing calling accuracy, coverage, and read depth in sequence data by the use of haplotype blocks

Author:

Pook TorstenORCID,Nemri Adnane,Segovia Eric Gerardo Gonzalez,Simianer Henner,Schoen Chris-Carolin

Abstract

AbstractHigh-throughput genotyping of large numbers of lines remains a key challenge in plant genetics, requiring geneticists and breeders to find a balance between data quality and the number of genotyped lines under a variety of different existing technologies when resources are limited. In this work, we are proposing a new imputation pipeline (“HBimpute”) that can be used to generate high-quality genomic data from low read-depth whole-genome-sequence data. The key idea of the pipeline is the use of haplotype blocks from the software HaploBlocker to identify locally similar lines and merge their reads locally. The effectiveness of the pipeline is showcased on a dataset of 321 doubled haploid lines of a European maize landrace, which were sequenced with 0.5X read-depth. Overall imputing error rates are cut in half compared to the state-of-the-art software BEAGLE, while the average read-depth is increased to 83X, thus enabling the calling of structural variation. The usefulness of the obtained imputed data panel is further evaluated by comparing the performance in common breeding applications to that of genomic data from a 600k array. In particular for genome-wide association studies, the sequence data is shown to be performing slightly better. Furthermore, genomic prediction based on the overlapping markers from the array and sequence is leading to a slightly higher predictive ability for the imputed sequence data, thereby indicating that the data quality obtained from low read-depth sequencing is on par or even slightly higher than high-density array data. When including all markers for the sequence data, the predictive ability is slightly reduced indicating overall lower data quality in non-array markers.Author summaryHigh-throughput genotyping of large numbers of lines remains a key challenge in plant genetics and breeding. Cost, precision, and throughput must be balanced to achieve optimal efficiencies given available technologies and finite resources. Although genotyping arrays are still considered the gold standard in high-throughput quantitative genetics, recent advances in sequencing provide new opportunities for this. Both the quality and cost of genomic data generated based on sequencing are highly dependent on the used read depth. In this work, we are proposing a new imputation pipeline (“HBimpute”) that uses haplotype blocks to detect individuals of the same genetic origin and subsequently uses all reads of those individuals in the variant calling. Thus, the obtained virtual read depth is artificially increased, leading to higher calling accuracy, coverage, and the ability to all copy number variation based on relatively cheap low-read depth sequencing data. Thus, our approach makes sequencing a cost-competitive alternative to genotyping arrays with the additional benefit of the potential use of structural variation.

Publisher

Cold Spring Harbor Laboratory

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3