BACKGROUND
Due to the prevalence of depression, its often chronic course, re-occurrences, and associated disability, early detection and non-intrusive monitoring present a crucial tool in timely diagnosing and treatment, remission of depression, prevention of relapse, and therefore limiting its impact on quality of life and well-being. Existing successful attempts in exploiting artificial intelligence for early classification of depression are mostly data-driven and thus non-transparent and lacking effective means to deal with uncertainties.
OBJECTIVE
We present an approach towards designing an explainable, knowledge-based artificial intelligence for classification of symptoms of depression. The aim of the study was to define and evaluate an end-to-end framework for extracting observable depression cues from diary recordings, and to evaluate the framework and explore the potential of the pipeline to present a feasible solution for detecting symptoms of depression automatically, using observable behavior cues.
METHODS
First, we defined an end-to-end framework for extracting depression cues (i.e., facial, speech, and language features), and stored them as a digital patient resource (i.e., the Fast Healthcare Interoperability Resource). Second, we extracted these cues from 28 video recordings from SymptomMedia dataset (14 simulating a variety of diagnoses of depression, and 14 simulating other mental health-related diagnoses), and 27 recordings from DAIC-WOZ dataset (12 classified as having moderate or severe symptoms of depression, and 15 without any depressive symptoms), and compared the presence of the extracted features between recordings of individuals with depressive disorder and those without.
RESULTS
We identified several cues consistent with previous studies in regard to their distinction between individuals with and without depressive disorders through both datasets among language (i.e., use of first-person singular pronouns, use of negatively valanced words, explicit mentions of treatment of depression, some features of language complexity), speech (i.e., speaking rate, voiced speech and pauses, low articulation rate, monotonous speech), and facial cues (i.e., rotational energy of heat movements. Other defined cues require further research.
CONCLUSIONS
The nature/context of the discourse, the impact of other disorders and physical/psychological stress as well as quality and resolution of recordings play an important role in alignment of digital features with relevant background. The work presented in this paper provides a novel approach for the extraction of wide array of cues, relevant for depression classification, and opens up new opportunities for further research.