Abstract
AbstractMachine learning studies have shown that various phenotypes can be predicted from structural and functional brain images. However, in most such studies, prediction performance ranged from moderate to disappointing. It is unclear whether prediction performance will substantially improve with larger sample sizes or whether insufficient predictive information in brain images impedes further progress. Here, we systematically assess the effect of sample size on prediction performance using sample sizes far beyond what is possible in common neuroimaging studies. We project 3-9 fold improvements in prediction performance for behavioral and mental health phenotypes when moving from one thousand to one million samples. Moreover, we find that moving from single imaging modalities to multimodal input data can lead to further improvements in prediction performance, often on par with doubling the sample size. Our analyses reveal considerable performance reserves for neuroimaging-based phenotype prediction. Machine learning models may benefit much more from extremely large neuroimaging datasets than currently believed.
Publisher
Cold Spring Harbor Laboratory
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献