Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing

Author:

Kumari Santoshi1

Affiliation:

1. RUAS, India

Abstract

A huge amount of unstructured data is generated from social media platforms like Twitter. Volume of tweets and the velocity with which they are generated on various topics presents extensive challenges in data analytics and processing techniques. Linguistic flexibility for writing tweets presents many challenges in preprocessing and natural language processing tasks. Addressing these challenges, this chapter aims to select, modify, and apply information retrieval and preprocessing steps for retrieving, storing, organizing, and cleaning real-time large-scale unstructured Twitter data. The work focuses on reviewing the previous research and applying suitable preprocessing methods to improve the quality of data by removing unessential data. It is also observed that using tweeter APIs and access tokens provides easy access to real-time tweets. Preprocessing methods are fundamental steps of text analytics and NLP tasks to process unstructured data. Analyzing suitable preprocessing methods like tokenization, removal of stop word, stemming, and lemmatization are applied to normalize the extracted Twitter data.

Publisher

IGI Global

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Optimized machine learning model discourse analysis;Education and Information Technologies;2024-02-08

2. Computational linguistics based text emotion analysis using enhanced beetle antenna search with deep learning during COVID-19 pandemic;PeerJ Computer Science;2023-12-06

3. Online User Reviews Investigation Towards Madura Island Tourism using Latent Semantic Analysis;2022 IEEE 8th Information Technology International Seminar (ITIS);2022-10-19

4. Impact of Preprocessing on Twitter Based Covid-19 Vaccination Text Data by Classification Techniques;2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC);2022-05-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3