Abstract
AbstractLow-density genotyping followed by imputation reduces genotyping costs while still providing high-density marker information. An increased marker density has the potential to improve the outcome of all applications that are based on genomic data. This study investigates techniques for 1k to 20k genomic marker imputation for plant breeding programs with sugar beet as an example crop, where these are realistic marker numbers for modern breeding applications.The generally accepted ‘gold standard’ for imputation, Beagle 5.1, was compared to the recently developed software AlphaPlantImpute2 which is designed specifically for plant breeding. For Beagle 5.1 and AlphaPlantImpute2, the imputation strategy as well as the imputation parameters were optimized in this study. We found that the imputation accuracy of Beagle could be tremendously improved (0.22 to 0.67) by tuning parameters, mainly by lowering the values for the parameter for the effective population size and increasing the number of iterations performed. Separating the phasing and imputation steps also improved accuracies when optimized parameters were used (0.67 to 0.82). We also found that the imputation accuracy of Beagle decreased when more low-density lines were included for imputation. AlphaPlantImpute2 produced very high accuracies without optimization (0.89) and was generally less responsive to optimization. Overall, AlphaPlantImpute2 performed relatively better for imputation while Beagle was better for phasing. Combining both tools yielded the highest accuracies.SummaryGenotype marker information allows the prediction of an individual’s breeding value without the need to observe its actual phenotype which can accelerate the breeding progress. The more markers are genotyped, the better the genomic prediction may be. However, analyzing many markers is costly, particularly in commercial breeding programs where thousands of new individuals are genotyped. A solution to obtain information for all markers, while spending comparatively little on genotyping, is to genotype only a small fraction of markers in most individuals. Together with high-density information on other individuals, the low-density individuals can be imputed to high-density. High-density individuals are typically parents or highly influential individuals.In this study, we compare the widely used software Beagle with the recently developed software AlphaPlantImpute2 on plant breeding data. To allow a fair comparison, we first optimized existing methods and developed new approaches. This was done to avoid comparing results of a less ideal version of one software to optimized settings of another software. After optimization, the software were evaluated in different scenarios with regards to genotyping errors, population types and number of markers based on simulated data. Simulated data were based on real marker data from a sugar beet population as input to mimic the population history of a commercial breeding population.AlphaPlantImpute2 performs well with default parameters, while much optimization with regards to parameters and strategy was needed to boost accuracies of Beagle. A pipeline is presented which uses Beagle for phasing and AlphaPlantImpute2 for imputation. This pipeline yielded the highest accuracies and shortest run time.Core IdeasBeagle is sensitive to parameter tuningBest imputation accuracies could be achieved by using a combination of Beagle and AlphaPlantImpute2The population structure influence imputation accuracy
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献