Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study-Reference-Cited by-同舟云学术

Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study

Published:2023-07-17 Issue:1 Volume:55 Page:
ISSN:1297-9686
Container-title:Genetics Selection Evolution
language:en
Short-container-title:Genet Sel Evol

Author:

Jang Sungbong^ORCID,Tsuruta Shogo,Leite Natalia Galoro,Misztal Ignacy,Lourenco Daniela

Abstract

Abstract Background Identifying true positive variants in genome-wide associations (GWA) depends on several factors, including the number of genotyped individuals. The limited dimensionality of genomic information may give insights into the optimal number of individuals to be used in GWA. This study investigated different discovery set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in the genomic relationship matrix (G). In addition, we investigated the impact on the prediction accuracy by adding variants, which were selected based on different set sizes, to the regular single nucleotide polymorphism (SNP) chips used for genomic prediction. Methods We simulated sequence data that included 500k SNPs with 200 or 2000 quantitative trait nucleotides (QTN). A regular 50k panel included one in every ten simulated SNPs. Effective population size (Ne) was set to 20 or 200. GWA were performed using a number of genotyped animals equivalent to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99% of the variance. In addition, the largest discovery set consisted of 30k genotyped animals. Limited or extensive phenotypic information was mimicked by changing the trait heritability. Significant and large-effect size SNPs were added to the 50k panel and used for single-step genomic best linear unbiased prediction (ssGBLUP). Results Using a number of genotyped animals corresponding to at least EIG98 allowed the identification of QTN with the largest effect sizes when Ne was large. Populations with smaller Ne required more than EIG98. Furthermore, including genotyped animals with a higher reliability (i.e., a higher trait heritability) improved the identification of the most informative QTN. Prediction accuracy was highest when the significant or the large-effect SNPs representing twice the number of simulated QTN were added to the 50k panel. Conclusions Accurately identifying causative variants from sequence data depends on the effective population size and, therefore, on the dimensionality of genomic information. This dimensionality can help identify the most suitable sample size for GWA and could be considered for variant selection, especially when resources are restricted. Even when variants are accurately identified, their inclusion in prediction models has limited benefits.

Funder

US Department of Agriculture's National Institute of Food and Agriculture

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Animal Science and Zoology,General Medicine,Ecology, Evolution, Behavior and Systematics

Link

https://link.springer.com/content/pdf/10.1186/s12711-023-00823-0.pdf

Reference40 articles.

1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101:5–22.

2. Berisa T, Pickrell JK. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–5.

3. Stam P. The distribution of the fraction of the genome identical by descent in finite random mating populations. Genet Res. 1980;35:131–55.