Abstract
Background
The innovative method of sentiment analysis based on an emotional lexicon shows prominent advantages in capturing emotional information, such as individual attitudes, experiences, and needs, which provides a new perspective and method for emotion recognition and management for patients with breast cancer (BC). However, at present, sentiment analysis in the field of BC is limited, and there is no emotional lexicon for this field. Therefore, it is necessary to construct an emotional lexicon that conforms to the characteristics of patients with BC so as to provide a new tool for accurate identification and analysis of the patients’ emotions and a new method for their personalized emotion management.
Objective
This study aimed to construct an emotional lexicon of patients with BC.
Methods
Emotional words were obtained by merging the words in 2 general sentiment lexicons, the Chinese Linguistic Inquiry and Word Count (C-LIWC) and HowNet, and the words in text corpora acquired from patients with BC via Weibo, semistructured interviews, and expressive writing. The lexicon was constructed using manual annotation and classification under the guidance of Russell’s valence-arousal space. Ekman’s basic emotional categories, Lazarus’ cognitive appraisal theory of emotion, and a qualitative text analysis based on the text corpora of patients with BC were combined to determine the fine-grained emotional categories of the lexicon we constructed. Precision, recall, and the F1-score were used to evaluate the lexicon’s performance.
Results
The text corpora collected from patients in different stages of BC included 150 written materials, 17 interviews, and 6689 original posts and comments from Weibo, with a total of 1,923,593 Chinese characters. The emotional lexicon of patients with BC contained 9357 words and covered 8 fine-grained emotional categories: joy, anger, sadness, fear, disgust, surprise, somatic symptoms, and BC terminology. Experimental results showed that precision, recall, and the F1-score of positive emotional words were 98.42%, 99.73%, and 99.07%, respectively, and those of negative emotional words were 99.73%, 98.38%, and 99.05%, respectively, which all significantly outperformed the C-LIWC and HowNet.
Conclusions
The emotional lexicon with fine-grained emotional categories conforms to the characteristics of patients with BC. Its performance related to identifying and classifying domain-specific emotional words in BC is better compared to the C-LIWC and HowNet. This lexicon not only provides a new tool for sentiment analysis in the field of BC but also provides a new perspective for recognizing the specific emotional state and needs of patients with BC and formulating tailored emotional management plans.