A platform for connecting social media data to domain-specific topics using large language models: an application to student mental health

Author:

Ruocco Leonard12,Zhuang Yuqian12,Ng Raymond13,Munthali Richard J2,Hudec Kristen L2,Wang Angel Y2ORCID,Vereschagin Melissa2,Vigo Daniel V24

Affiliation:

1. Data Science Institute, University of British Columbia , Vancouver, British Columbia V6T 1Z4, Canada

2. Department of Psychiatry, University of British Columbia , Vancouver, British Columbia V6T 2A1, Canada

3. Department of Computer Science, University of British Columbia , Vancouver, British Columbia, V6T 1Z4, Canada

4. Department of Global Health and Social Medicine, Harvard University , Boston, MA 02115, United States

Abstract

Abstract Objectives To design a novel artificial intelligence-based software platform that allows users to analyze text data by identifying various coherent topics and parts of the data related to a specific research theme-of-interest (TOI). Materials and Methods Our platform uses state-of-the-art unsupervised natural language processing methods, building on top of a large language model, to analyze social media text data. At the center of the platform’s functionality is BERTopic, which clusters social media posts, forming collections of words representing distinct topics. A key feature of our platform is its ability to identify whole sentences corresponding to topic words, vastly improving the platform’s ability to perform downstream similarity operations with respect to a user-defined TOI. Results Two case studies on mental health among university students are performed to demonstrate the utility of the platform, focusing on signals within social media (Reddit) data related to depression and their connection to various emergent themes within the data. Discussion and Conclusion Our platform provides researchers with a readily available and inexpensive tool to parse large quantities of unstructured, noisy data into coherent themes, as well as identifying portions of the data related to the research TOI. While the development process for the platform was focused on mental health themes, we believe it to be generalizable to other domains of research as well.

Funder

Health Canada’s Substance Use and Addictions Program

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3