Abstract
A huge amount of unstructured data is generated from social media platforms like Twitter. Volume of tweets and the velocity with which they are generated on various topics presents extensive challenges in data analytics and processing techniques. Linguistic flexibility for writing tweets presents many challenges in preprocessing and natural language processing tasks. Addressing these challenges, this chapter aims to select, modify, and apply information retrieval and preprocessing steps for retrieving, storing, organizing, and cleaning real-time large-scale unstructured Twitter data. The work focuses on reviewing the previous research and applying suitable preprocessing methods to improve the quality of data by removing unessential data. It is also observed that using tweeter APIs and access tokens provides easy access to real-time tweets. Preprocessing methods are fundamental steps of text analytics and NLP tasks to process unstructured data. Analyzing suitable preprocessing methods like tokenization, removal of stop word, stemming, and lemmatization are applied to normalize the extracted Twitter data.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献