Abstract
PurposeThe main aim of this paper is to build an approach to analyze the tourist content posted on social media. The approach incorporates information extraction, cleaning, data processing, descriptive and content analysis and can be used on different social media platforms such as Instagram, Facebook, etc. This work proposes an approach to social media analytics in traveler-generated content (TGC), and the authors use Twitter to apply this study and examine data about the city and the province of Granada.Design/methodology/approachIn order to identify what people are talking and posting on social media about places, events, restaurants, hotels, etc. the authors propose the following approach for data collection, cleaning and data analysis. The authors first identify the main keywords for the place of study. A descriptive analysis is subsequently performed, and this includes post metrics with geo-tagged analysis and user metrics, retweets and likes, comments, videos, photos and followers. The text is then cleaned. Finally, content analysis is conducted, and this includes word frequency calculation, sentiment and emotion detection and word clouds. Topic modeling was also performed with latent Dirichlet association (LDA).FindingsThe authors used the framework to collect 262,859 tweets about Granada. The most important hashtags are #Alhambra and #SierraNevada, and the most prolific user is @AlhambraCultura. The approach uses a seasonal context, and the posted tweets are divided into two periods (spring–summer and autumn–winter). Word frequency was calculated and again Granada, Alhambra are the most frequent words in both periods in English and Spanish. The topic models show the subjects that are mentioned in both languages, and although there are certain small differences in terms of language and season, the Alhambra, Sierra Nevada and gastronomy stand out as the most important topics.Research limitations/implicationsExtremely difficult to identify sarcasm, posts may be ambiguous, users may use both Spanish and English words in their tweets and tweets may contain spelling mistakes, colloquialisms or even abbreviations. Multilingualism represents also an important limitation since it is not clear how tweets written in different languages should be processed. The size of the data set is also an important factor since the greater the amount of data, the better the results. One of the largest limitations is the small number of geo-tagged tweets as geo-tagging would provide information about the place where the tweet was posted and opinions of it.Originality/valueThis study proposes an interesting way to analyze social media data, bridging tourism and social media literature in the data analysis context and contributes to discover patterns and features of the tourism destination through social media. The approach used provides the prospective traveler with an overview of the most popular places and the major posters for a particular tourist destination. From a business perspective, it informs managers of the most influential users, and the information obtained can be extremely useful for managing their tourism products in that region.
Subject
Tourism, Leisure and Hospitality Management
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献