Author:
Huisman Sophie M.,Kraiss Jannis T.,de Vos Jan Alexander
Abstract
BackgroundClinicians collect session therapy notes within patient session records. Session records contain valuable information about patients’ treatment progress. Sentiment analysis is a tool to extract emotional tones and states from text input and could be used to evaluate patients’ sentiment during treatment over time. This preliminary study aims to investigate the validity of automated sentiment analysis on session patient records within an eating disorder (ED) treatment context against the performance of human raters.MethodsA total of 460 patient session records from eight participants diagnosed with an ED were evaluated on their overall sentiment by an automated sentiment analysis and two human raters separately. The inter-rater agreement (IRR) between the automated analysis and human raters and IRR among the human raters was analyzed by calculating the intra-class correlation (ICC) under a continuous interpretation and weighted Cohen’s kappa under a categorical interpretation. Furthermore, differences regarding positive and negative matches between the human raters and the automated analysis were examined in closer detail.ResultsThe ICC showed a moderate automated-human agreement (ICC = 0.55), and the weighted Cohen’s kappa showed a fair automated-human (k = 0.29) and substantial human-human agreement (k = 0.68) for the evaluation of overall sentiment. Furthermore, the automated analysis lacked words specific to an ED context.Discussion/conclusionThe automated sentiment analysis performed worse in discerning sentiment from session patient records compared to human raters and cannot be used within practice in its current state if the benchmark is considered adequate enough. Nevertheless, the automated sentiment analysis does show potential in extracting sentiment from session records. The automated analysis should be further developed by including context-specific ED words, and a more solid benchmark, such as patients’ own mood, should be established to compare the performance of the automated analysis to.