Exploring a method for extracting concerns of multiple breast cancer patients in the domain of patient narratives using BERT and its optimization by domain adaptation using masked language modeling-Reference-Cited by-同舟云学术

Exploring a method for extracting concerns of multiple breast cancer patients in the domain of patient narratives using BERT and its optimization by domain adaptation using masked language modeling

Published:2024-09-06 Issue:9 Volume:19 Page:e0305496
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Watabe Satoshi,Watanabe Tomomi,Yada Shuntaro^ORCID,Aramaki Eiji,Yajima Hiroshi,Kizaki Hayato,Hori Satoko^ORCID

Abstract

Narratives posted on the internet by patients contain a vast amount of information about various concerns. This study aimed to extract multiple concerns from interviews with breast cancer patients using the natural language processing (NLP) model bidirectional encoder representations from transformers (BERT). A total of 508 interview transcriptions of breast cancer patients written in Japanese were labeled with five types of concern labels: "treatment," "physical," "psychological," "work/financial," and "family/friends." The labeled texts were used to create a multi-label classifier by fine-tuning a pre-trained BERT model. Prior to fine-tuning, we also created several classifiers with domain adaptation using (1) breast cancer patients’ blog articles and (2) breast cancer patients’ interview transcriptions. The performance of the classifiers was evaluated in terms of precision through 5-fold cross-validation. The multi-label classifiers with only fine-tuning had precision values of over 0.80 for "physical" and "work/financial" out of the five concerns. On the other hand, precision for "treatment" was low at approximately 0.25. However, for the classifiers using domain adaptation, the precision of this label took a range of 0.40–0.51, with some cases improving by more than 0.2. This study showed combining domain adaptation with a multi-label classifier on target data made it possible to efficiently extract multiple concerns from interviews.

Funder

JSPS KAKENHI

JST CREST

Publisher

Public Library of Science (PLoS)

Reference44 articles.

1. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.;H Sung;CA Cancer J Clin,2021

2. Cancer Statistics in Japan-2019. Tokyo, Japan: Foundation for Promotion of Cancer Research; Mar 2020.

3. Global surveillance of cancer survival 1995–2009: analysis of individual data for 25,676,887 patients from 279 population-based registries in 67 countries (CONCORD-2).;C Allemani;Lancet,2015

4. Predictors of distress in female breast cancer survivors: a systematic review;A Syrowatka;Breast Cancer Res Treat

5. Developing a Return to Work Intervention for Breast Cancer Survivors with the Intervention Mapping Protocol: Challenges and Opportunities of the Needs Assessment.;JB Fassier;Front Public Health,2018