Abstract
Twitter’s APIs are now the main data source for social media researchers. A large number of studies have utilized Twitter data for diverse research interests. Twitter users can share their precise real-time location, and Twitter APIs can provide this information as longitude and latitude. These geotagged Twitter data can help to study human activities and movements for different applications. Compared to the mostly small-scale data samples in different domains, such as social science, collecting geotagged data offers large samples. There is a fundamental question whether geotagged users can represent non-geotagged users. While some studies have investigated the question from different perspectives, they did not investigate profile information and the contents of tweets of geotagged and non-geotagged users. This empirical study addresses this limitation by applying text mining, statistical analysis, and machine learning techniques on Twitter data comprising more than 88,000 users and over 170 million tweets. Our findings show that there is a significant difference (p-value < 0.001) between geotagged and non-geotagged users based on 73% of the features obtained from the users’ profiles and tweets. The features can also help to distinguish between geotagged and non-geotagged users with around 80% accuracy. This research illustrates that geotagged users do not represent the Twitter population.
Funder
Office of the Vice President for Research, University of South Carolina
Subject
Earth and Planetary Sciences (miscellaneous),Computers in Earth Sciences,Geography, Planning and Development
Reference74 articles.
1. Twitter by the Numbers: Stats, Demographics & Fun Factshttps://www.omnicoreagency.com/twitter-statistics/#:~:text=Twitter%20Demographics&text=There%20are%20262%20million%20International,users%20have%20higher%20college%20degrees.
2. Twitter: Number of Monthly Active U.S. Users 2010–2019https://www.statista.com/statistics/274564/monthly-active-twitter-users-in-the-united-states/
3. Twitter and Research: A Systematic Literature Review Through Text Mining
4. Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
5. Identifying and Analyzing Health-Related Themes in Disinformation Shared by Conservative and Liberal Russian Trolls on Twitter
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献