Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums-Reference-Cited by-同舟云学术

Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums

Published:2009-07-22 Issue:3 Volume:11 Page:e25
ISSN:1438-8871
Container-title:Journal of Medical Internet Research
language:en
Short-container-title:J Med Internet Res

Author:

Himmel Wolfgang,Reincke Ulrich,Michelmann Hans Wilhelm

Abstract

Background Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. Objective To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. Methods We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. Results According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, “words”) also included variables from other categories, most often with a negative sign. For example, absence of words predictive for “menstruation” was a strong indicator for the category “pregnancy test.” Conclusions Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback.

Publisher

JMIR Publications Inc.

Subject

Health Informatics

Reference36 articles.

1. Medical text-based consultations on the Internet: A 4-year study

2. The use of an Internet-based Ask the Doctor Service involving family physicians: evaluation by a web survey

3. Advice from a Medical Expert through the Internet on Queries about AIDS and Hepatitis: Analysis of a Pilot Experiment

4. Requests for medical advice from patients and families to health care providers who publish on the World Wide Web

5. Patients Looking for Information on the Internet and Seeking Teleadvice

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LDA Model-Based Analysis of ESG Report Themes Extraction -The Example of A-Share Listed Companies In 2021;Proceedings of the 2023 6th International Conference on Information Management and Management Science;2023-08-25

2. Emotional reactions to infertility diagnosis: thematic and natural language processing analyses of the 1000 Dreams survey;Reproductive BioMedicine Online;2023-02

3. Online Diagnosis-Treatment Department Recommendation based on Machine Learning in China;2022-08-26

4. The Role of Online Social Support in Patients Undergoing Infertility Treatment – A Comparison of Pregnant and Non-pregnant Members;Health Communication;2021-04-15

5. Studying Online Support for Caregivers of Patients With Alzheimer's Disease in China;International Journal of Healthcare Information Systems and Informatics;2020-10