Data-Driven Lexical Normalization for Medical Social Media-Reference-Cited by-同舟云学术

Data-Driven Lexical Normalization for Medical Social Media

Published:2019-08-20 Issue:3 Volume:3 Page:60
ISSN:2414-4088
Container-title:Multimodal Technologies and Interaction
language:en
Short-container-title:MTI

Author:

Dirkson ^ORCID,Verberne ^ORCID,Sarker ^ORCID,Kraaij ^ORCID

Abstract

In the medical domain, user-generated social media text is increasingly used as a valuablecomplementary knowledge source to scientific medical literature. The extraction of this knowledge iscomplicated by colloquial language use and misspellings. However, lexical normalization of suchdata has not been addressed effectively. This paper presents a data-driven lexical normalizationpipeline with a novel spelling correction module for medical social media. Our method significantlyoutperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63despite extreme imbalance in the data. We also present the first corpus for spelling mistake detectionand correction in a medical patient forum.

Funder

SIDN fonds

Publisher

MDPI AG

Subject

Computer Networks and Communications,Computer Science Applications,Human-Computer Interaction,Neuroscience (miscellaneous)

Link

https://www.mdpi.com/2414-4088/3/3/60/pdf

Reference54 articles.

1. Capturing the Patient’s Perspective: a Review of Advances in Natural Language Processing of Health-Related Text

2. Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter

3. Utilizing social media data for pharmacovigilance: A review