Abstract
AbstractPower analyses are often used to determine the number of animals required for a genome wide association analysis (GWAS). These analyses are typically intended to estimate the sample size needed for at least one locus to exceed a genome-wide significance threshold. A related question that is less commonly considered is the number of significant loci that will be discovered with a given sample size. We used simulations based on a real dataset that consisted of 3,173 male and female adult N/NIH heterogeneous stock (HS) rats to explore the relationship between sample size and the number of significant loci discovered. Our simulations examined the number of loci identified in sub-samples of the full dataset. The sub-sampling analysis was conducted for four traits with low (0.15 ± 0.03), medium (0.31 ± 0.03 and 0.36 ± 0.03) and high (0.46 ± 0.03) SNP-based heritabilities. For each trait, we sub-sampled the data 100 times at different sample sizes (500, 1,000, 1,500, 2,000, and 2,500). We observed an exponential increase in the number of significant loci with larger sample sizes. Our results are consistent with similar observations in human GWAS and imply that future rodent GWAS should use sample sizes that are significantly larger than those needed to obtain a single significant result.
Publisher
Cold Spring Harbor Laboratory