Use of Machine Learning Tools in Evidence Synthesis of Tobacco Use Among Sexual and Gender Diverse Populations: Algorithm Development and Validation-Reference-Cited by-同舟云学术

Use of Machine Learning Tools in Evidence Synthesis of Tobacco Use Among Sexual and Gender Diverse Populations: Algorithm Development and Validation

Published:2024-01-24 Issue: Volume:8 Page:e49031
ISSN:2561-326X
Container-title:JMIR Formative Research
language:en
Short-container-title:JMIR Form Res

Author:

Ma Shaoying^ORCID,Jiang Shuning^ORCID,Yang Olivia^ORCID,Zhang Xuanzhi^ORCID,Fu Yu^ORCID,Zhang Yusen^ORCID,Kaareen Aadeeba^ORCID,Ling Meng^ORCID,Chen Jian^ORCID,Shang Ce^ORCID

Abstract

Background From 2016 to 2021, the volume of peer-reviewed publications related to tobacco has experienced a significant increase. This presents a considerable challenge in efficiently summarizing, synthesizing, and disseminating research findings, especially when it comes to addressing specific target populations, such as the LGBTQ+ (lesbian, gay, bisexual, transgender, queer, intersex, asexual, Two Spirit, and other persons who identify as part of this community) populations. Objective In order to expedite evidence synthesis and research gap discoveries, this pilot study has the following three aims: (1) to compile a specialized semantic database for tobacco policy research to extract information from journal article abstracts, (2) to develop natural language processing (NLP) algorithms that comprehend the literature on nicotine and tobacco product use among sexual and gender diverse populations, and (3) to compare the discoveries of the NLP algorithms with an ongoing systematic review of tobacco policy research among LGBTQ+ populations. Methods We built a tobacco research domain–specific semantic database using data from 2993 paper abstracts from 4 leading tobacco-specific journals, with enrichment from other publicly available sources. We then trained an NLP model to extract named entities after learning patterns and relationships between words and their context in text, which further enriched the semantic database. Using this iterative process, we extracted and assessed studies relevant to LGBTQ+ tobacco control issues, further comparing our findings with an ongoing systematic review that also focuses on evidence synthesis for this demographic group. Results In total, 33 studies were identified as relevant to sexual and gender diverse individuals’ nicotine and tobacco product use. Consistent with the ongoing systematic review, the NLP results showed that there is a scarcity of studies assessing policy impact on this demographic using causal inference methods. In addition, the literature is dominated by US data. We found that the product drawing the most attention in the body of existing research is cigarettes or cigarette smoking and that the number of studies of various age groups is almost evenly distributed between youth or young adults and adults, consistent with the research needs identified by the US health agencies. Conclusions Our pilot study serves as a compelling demonstration of the capabilities of NLP tools in expediting the processes of evidence synthesis and the identification of research gaps. While future research is needed to statistically test the NLP tool’s performance, there is potential for NLP tools to fundamentally transform the approach to evidence synthesis.

Publisher

JMIR Publications Inc.

Subject

Health Informatics,Medicine (miscellaneous)

Reference55 articles.

1. United States Public Health Service Office of the Surgeon GeneralSmoking cessation: a report of the surgeon generalNational Library of Medicine20202024-01-09https://www.ncbi.nlm.nih.gov/books/NBK555591/

2. National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and HealthThe health consequences of smoking- 50 years of progressNational Library of Medicine20142024-01-09https://www.ncbi.nlm.nih.gov/books/NBK179276/

3. LGBTQ+ people experience a health burden from commercial tobaccoCenters for Disease Control and Prevention20222024-01-09https://www.cdc.gov/tobacco/health-equity/lgbtq/health-burden.html

4. Conditional Probability Joint Extraction of Nested Biomedical Events: Design of a Unified Extraction Framework Based on Neural Networks

5. Triaging Patient Complaints: Monte Carlo Cross-Validation of Six Machine Learning Classifiers