Evaluating the Performance of GPT-assisted Identification and Classification of Eating Disorders with Text-based Chinese Social Media Data (Preprint)

Author:

Yan TianqiangORCID,Zhang YuchengORCID,Han JiayiORCID,Liu ZhiyuanORCID,Barnhart WesleyORCID,Sun ShaojingORCID,Zhou JianjunORCID,Ji FengORCID,He JinboORCID

Abstract

BACKGROUND

Eating disorders (EDs) are related to an array of negative health outcomes and have been a major public health concern globally, including in China. However, the rates of detection and treatment-seeking for EDs in China are low and the effective treatment is even lower. Thus, exploring new ways to detect and classify EDs has significant implications for EDs prevention and treatment in China.

OBJECTIVE

This study aimed to evaluate the performance of large language models (LLMs), particularly OpenAI’s GPT-4, on the identification and classification of EDs, utilizing real-world Chinese plain-text social media data.

METHODS

We evaluated the performance of LLMs with two hierarchical tasks, including the Phase 1 task of judging whether a sample was ED-positive, and the Phase 2 task of inferring the specific ED subtypes for positive samples, including anorexia nervosa (AN), bulimia nervosa (BN), and binge-eating disorder (BED). GPT-4 was selected as the representative of state-of-the-art LLMs, tuned with natural language instructions in a manner of zero-shot Chain-of-Thought (CoT) prompting based on manually-edited ED criteria. The performance of GPT-4 was compared with three baseline schemes, including ERNIE 3.0, 1-gram Bag-of-Words (BoW), and 3-gram BoW. The performance was quantified through overall accuracy and linear accuracy.

RESULTS

In the Phase 1 task of identifying ED-positive samples, GPT-4 showed the lowest overall accuracy of 0.768, compared with that of the baselines (0.810-0.818). However, in the Phase 2 task of classifying AN, BN, and BED, GPT-4 outperformed the others, with a linear accuracy of 0.943 (0.687-0.877 for baselines) and an overall accuracy of 0.887 (0.373-0.753 for baselines).

CONCLUSIONS

These findings suggest that GPT-4’s zero-shot in-context learning capability may be better suited for classifying complex semantic capabilities such as ED subtypes (e.g., AN, BN, and BED). Also, conventional, non-LLM methods (ERNIE 3.0, 1-gram BoW, and 3-gram BoW) may be better suited for the initial identification of probable EDs.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3