BACKGROUND
Unlike research project-based health data collections, such as questionnaires, interviews, and social media platforms, which allow patients to freely discuss their health status and obtain peer support, previous literature has pointed out that both public-facing websites and private Facebook can serve as data sources for patient-reported outcomes.
OBJECTIVE
This study aimed to use natural language processing (NLP) techniques based on machine learning to identify concerns regarding the postoperative quality of life and symptom burdens in uterine fibroids after focused ultrasound ablation surgery.
METHODS
Screenshots taken from the clinician-patient WeChat groups were converted into free texts using image text recognition technology and used as the research object of this study, which used regular expressions in Python to search for symptom burdens in over 900,000 words of WeChat group chats associated with 408 patients in Chongqing Haifu Hospital diagnosed with uterine fibroids between 2010 and 2020. We first built a corpus of symptoms by manually coding 30% of the WeChat texts, and then used regular expressions to crawl symptom information from the remaining texts based on this corpus. We compared the results with a manual review (gold standard) of the same records. The mixed method was used to access the relationship between the population baseline data and conceptual symptoms, Quantitative and qualitative results were examined
RESULTS
A total of 190,000 words of uterine fibroids patients' free text were finally obtained after data cleaning. A total of 408 patients were included in the study. The age of the patients was 39.94±6.81 years, and their BMI was 23.47±29.37 (kg/m^2). The median reporting times of the seven major symptoms were 21, 26, 57, 2, 18, 30, and 49 days. Results showed that patients with dysmenorrhea were younger and slimmer (mean (SD), P<.05), with lower fertility and parity (P<.05), and tended to stay longer in the hospital (P<.05). Logistic regression models identified menstrual duration (odds ratios (OR) (95%CI)), age at menarche (OR (95%CI)), reported symptoms before surgery (OR (95%CI)), and the number and size of fibroids as significant risk factors for postoperative symptoms.
CONCLUSIONS
Unstructured free texts from social media platforms extracted by NLP technology can be used for analysis, to capture the conceptual information about patients' HRQol, screen out high-risk groups, and track the reporting time of certain symptoms, adopt personalized treatment for patients at different stages of recovery to improve the quality of life of patients. Python-based text mining of free-text data can accurately extract symptom burden administered and save considerable time compared to manual review, maximizing the utility of the extant information in population-based electronic health records for comparative effectiveness research.