Abstract
ABSTRACTDetermining the appropriate sample size (N) for comparative biological experiments is critical for obtaining reliable results. In order to determine the N, the usual approach is to perform a power calculation, which involves understanding the variability between samples and the expected effect size. Here, we focused on bulk RNA-seq experiments, which have become ubiquitous in biology, but which have many unknown or difficult to estimate parameters, and so the required analyses to determine the minimum N is typically lacking. We therefore performed two N=30 profiling studies between wild-type mice and mice in which one copy of a gene had been deleted, to determine how many mice would be required to minimize false positives and to maximize true discoveries found in the N of 30 experiment. Results from experiments with N=4 or less are shown to be highly misleading, given the substantial false positive rate, and the lack of discovery of genes later found with higher N. For a cut-off of 2-fold expression differences, we found that an N of 6-7 mice was required to consistently decrease the false positive rate to below 50%, and that “more is always better” when it came to discovery rates - an N of 8-12 is significantly better in lowering the false positive rate.A common method to reduce false discovery rate in underpowered experiments is to raise the fold cutoff or increase the stringency of the P-value and include only highly perturbed, highly significant genes. We show that while this strategy is no substitute for increasing the N of the experiment, because it results in consistently inflated effect sizes and a substantial drop in sensitivity. These data should be helpful to others in choosing their Ns, since it’s often not practical to do such large studies for every mouse model.
Publisher
Cold Spring Harbor Laboratory