Abstract
Brain imaging research enjoys increasing adoption of supervised machine learning for single-participant disease classification. Yet, the success of these algorithms likely depends on population diversity, including demographic differences and other factors that may be outside of primary scientific interest. Here, we capitalize on propensity scores as a composite confound index to quantify diversity due to major sources of population variation. We delineate the impact of population heterogeneity on the predictive accuracy and pattern stability in 2 separate clinical cohorts: the Autism Brain Imaging Data Exchange (ABIDE, n = 297) and the Healthy Brain Network (HBN, n = 551). Across various analysis scenarios, our results uncover the extent to which cross-validated prediction performances are interlocked with diversity. The instability of extracted brain patterns attributable to diversity is located preferentially in regions part of the default mode network. Collectively, our findings highlight the limitations of prevailing deconfounding practices in mitigating the full consequences of population diversity.
Funder
Healthy Brains for Healthy Lives
Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada
Canadian Institutes of Health Research
SickKids Foundation
Azrieli Center for Autism Research
BrainCanada
Tier-2 Canada Research Chairs program
National Institutes of Health
Healthy Brains Healthy Lives initiative
Google
CIFAR Artificial Intelligence Chairs program
National Research Foundation Singapore
NUS Yong Loo Lin School of Medicine
National Medical Research Council
Publisher
Public Library of Science (PLoS)
Subject
General Agricultural and Biological Sciences,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Neuroscience