Lexical normalization for social media text-Reference-Cited by-同舟云学术

Lexical normalization for social media text

Published:2013-01 Issue:1 Volume:4 Page:1-27
ISSN:2157-6904
Container-title:ACM Transactions on Intelligent Systems and Technology
language:en
Short-container-title:ACM Trans. Intell. Syst. Technol.

Author:

Han Bo¹,Cook Paul²,Baldwin Timothy¹

Affiliation:

1. NICTA Victoria Research Laboratory and The University of Melbourne, Australia

2. The University of Melbourne, Australia

Abstract

Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this article, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalizing lexical variants. Our method uses a classifier to detect lexical variants, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn't require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter.

Funder

Australian Research Council

Communication and Digital Economy

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2414425.2414430

Reference56 articles.

1. Brants T. and Franz A. 2006. Web 1T 5-gram Version 1. Brants T. and Franz A. 2006. Web 1T 5-gram Version 1.

2. An improved error model for noisy channel spelling correction

Cited by 86 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Near-real-time earthquake-induced fatality estimation using crowdsourced data and large-language models;International Journal of Disaster Risk Reduction;2024-09

2. Advancing language models through domain knowledge integration: a comprehensive approach to training, evaluation, and optimization of social scientific neural word embeddings;Journal of Computational Social Science;2024-05-22

3. A normalization model for repeated letters in social media hate speech text based on rules and spelling correction;PLOS ONE;2024-03-21

4. Towards finding the lost generation of autistic adults: A deep and multi-view learning approach on social media;Knowledge-Based Systems;2023-09

5. 2 Challenges of Persian NLP: The Importance of Text Normalization;Persian Computational Linguistics and NLP;2023-05-08