Abstract
AbstractThe peaking phenomenon refers to the observation that, after a point, the performance of prediction models starts to decrease as the number of predictors (p) increases. This issue is commonly encountered in small datasets (colloquially known as “small n, large p” datasets or high-dimensional data). It was recently reported based on analysis of data from five placebo-controlled trials that clinical prediction models in schizophrenia showed poor performance (average balanced accuracy, BAC, 0.54). This was interpreted to suggest that prediction models in schizophrenia have poor generalizability. In this paper we demonstrate that this outcome more likely reflects the peaking phenomenon in a small n, large p dataset (n=1513 participants, p=217) and generalize this to a set of illustrative cases using simulated data. We then demonstrate that an ensemble of supervised learning models trained using more data (18 placebo-controlled trials, n=4634 participants), but fewer predictors (p=33), achieves better prediction (average BAC = 0.64) which generalizes to out-of-sample studies as well as to data from active-controlled trials (n=1463, average BAC = 0.67). Based on these findings, we argue that the achievable prediction accuracy for treatment response in schizophrenia— and likely for many other medical conditions—is highly dependent on sample size and the number of included predictors, and, hence, remains unknown until more data has been analyzed. Finally, we provide recommendations for how researchers and data holders might work to improve future data analysis efforts in clinical prediction.
Publisher
Cold Spring Harbor Laboratory