BACKGROUND
Eating disorders (EDs) are related to an array of negative health outcomes and have been a major public health concern globally, including in China. However, the rates of detection and treatment-seeking for EDs in China are low and the effective treatment is even lower. Thus, exploring new ways to detect and classify EDs has significant implications for EDs prevention and treatment in China.
OBJECTIVE
This study aimed to evaluate the performance of large language models (LLMs), particularly OpenAI’s GPT-4, on the identification and classification of EDs, utilizing real-world Chinese plain-text social media data.
METHODS
We evaluated the performance of LLMs with two hierarchical tasks, including the Phase 1 task of judging whether a sample was ED-positive, and the Phase 2 task of inferring the specific ED subtypes for positive samples, including anorexia nervosa (AN), bulimia nervosa (BN), and binge-eating disorder (BED). GPT-4 was selected as the representative of state-of-the-art LLMs, tuned with natural language instructions in a manner of zero-shot Chain-of-Thought (CoT) prompting based on manually-edited ED criteria. The performance of GPT-4 was compared with three baseline schemes, including ERNIE 3.0, 1-gram Bag-of-Words (BoW), and 3-gram BoW. The performance was quantified through overall accuracy and linear accuracy.
RESULTS
In the Phase 1 task of identifying ED-positive samples, GPT-4 showed the lowest overall accuracy of 0.768, compared with that of the baselines (0.810-0.818). However, in the Phase 2 task of classifying AN, BN, and BED, GPT-4 outperformed the others, with a linear accuracy of 0.943 (0.687-0.877 for baselines) and an overall accuracy of 0.887 (0.373-0.753 for baselines).
CONCLUSIONS
These findings suggest that GPT-4’s zero-shot in-context learning capability may be better suited for classifying complex semantic capabilities such as ED subtypes (e.g., AN, BN, and BED). Also, conventional, non-LLM methods (ERNIE 3.0, 1-gram BoW, and 3-gram BoW) may be better suited for the initial identification of probable EDs.