Abstract
Background
Computational psychiatry has the potential to advance the diagnosis, mechanistic understanding, and treatment of mental health conditions. Promising results from clinical samples have led to calls to extend these methods to mental health risk assessment in the general public; however, data typically used with clinical samples are neither available nor scalable for research in the general population. Digital phenotyping addresses this by capitalizing on the multimodal and widely available data created by sensors embedded in personal digital devices (eg, smartphones) and is a promising approach to extending computational psychiatry methods to improve mental health risk assessment in the general population.
Objective
Building on recommendations from existing computational psychiatry and digital phenotyping work, we aim to create the first computational psychiatry data set that is tailored to studying mental health risk in the general population; includes multimodal, sensor-based behavioral features; and is designed to be widely shared across academia, industry, and government using gold standard methods for privacy, confidentiality, and data integrity.
Methods
We are using a stratified, random sampling design with 2 crossed factors (difficulties with emotion regulation and perceived life stress) to recruit a sample of 400 community-dwelling adults balanced across high- and low-risk for episodic mental health conditions. Participants first complete self-report questionnaires assessing current and lifetime psychiatric and medical diagnoses and treatment, and current psychosocial functioning. Participants then complete a 7-day in situ data collection phase that includes providing daily audio recordings, passive sensor data collected from smartphones, self-reports of daily mood and significant events, and a verbal description of the significant daily events during a nightly phone call. Participants complete the same baseline questionnaires 6 and 12 months after this phase. Self-report questionnaires will be scored using standard methods. Raw audio and passive sensor data will be processed to create a suite of daily summary features (eg, time spent at home).
Results
Data collection began in June 2022 and is expected to conclude by July 2024. To date, 310 participants have consented to the study; 149 have completed the baseline questionnaire and 7-day intensive data collection phase; and 61 and 31 have completed the 6- and 12-month follow-up questionnaires, respectively. Once completed, the proposed data set will be made available to academic researchers, industry, and the government using a stepped approach to maximize data privacy.
Conclusions
This data set is designed as a complementary approach to current computational psychiatry and digital phenotyping research, with the goal of advancing mental health risk assessment within the general population. This data set aims to support the field’s move away from siloed research laboratories collecting proprietary data and toward interdisciplinary collaborations that incorporate clinical, technical, and quantitative expertise at all stages of the research process.
International Registered Report Identifier (IRRID)
DERR1-10.2196/53857