BACKGROUND
Machine learning models that exploit rich digital phenotyping data to forecast mental states could improve clinical practice in psychiatry. A current limitation is that such predictive models tend to disregard a frequent property of the prediction targets, namely the ordinal nature of the rating scales they are often obtained from.
OBJECTIVE
We first aimed to contrast the performances of ordinal regression vs. binary classification to predict various mental states for different forecast horizons. We also assessed how the tree-based eXtreme Gradient Boosting (XGBoost) algorithm performs compared to the long short-term memory (LSTM) algorithm, a popular type of recurrent neural network in digital phenotyping studies aimed at forecasting mental states.
METHODS
The CrossCheck dataset includes self-reports of mental states and smartphone sensor data contributed by patients with schizophrenia. Participants completed surveys on various mental states every 2-3 days on 4-point ordinal rating scales. Passive sensing data was collected continuously and aggregated over 6-hour periods. We trained 120 machine learning models to forecast mental states from passive sensing data: 10 mental states (e.g., Calm, Depressed, Seeing things) on 2 predictive tasks (ordinal regression, binary classification) with 2 learning algorithms (XGBoost, LSTM) over 3 forecast horizons (same day, next day, next week). While models were primarily evaluated with performance metrics that account for class imbalance (macro-averaged mean absolute error -MAMAE- for ordinal regression and balanced accuracy -BAcc- for binary classification), the impact of using metrics that do not deal with imbalance (mean absolute error, accuracy) was also investigated.
RESULTS
The dataset included 6364 surveys and 23,551 days of smartphone data from 62 participants. Marked class imbalance was observed for the ordinal labels, an issue that was only partially resolved by recoding original labels into binary classes. Globally, 45/60 ordinal regression models performed significatively above baseline with MAMAE between 1.19 and 0.77, and 58/60 binary classification models were significant with BAcc between 58% and 73%. Of note, evaluation metrics that do not deal with class imbalance erroneously reflected good performance. After scaling performance metrics to allow their comparison, ordinal regression and binary classification models achieved comparable performance on average. XGBoost models performed better or on par with LSTM models. As the forecast horizon expanded, a significant yet very small decrease in performance was observed.
CONCLUSIONS
The targets of mental state forecast models should preserve the valuable clinical information contained in ordinal rating scales. This is especially true given recoding multiple ordinal classes into binary classes does not lead to any gain in predictive performance. Moreover, model development should account for class imbalance, particularly so for ordinal regression where imbalance across classes is often more pronounced. Finally, our findings do not lend support to the implicitly assumed superiority of recurrent neural networks for forecasting.