Affiliation:
1. Georgetown University, Washington, United States
Abstract
Accurate estimates of user location are important for many online services, including event detection, disaster management, and determining public opinion. Neural network-based techniques have proven to be highly effective in predicting user location. However, these models typically require a large amount of labeled training data, which can be difficult to obtain in real-world scenarios. In this article, we present two approaches to tackle the issue of limited training data when predicting city level location. First, we consider a self-supervised approach that trains a state-level model without labeled data and then integrate this knowledge into the training dataset used for city-level predictions. Second, we explore the option of increasing the number of training examples by utilizing external resources to generate
synthetic users
. Finally, we combine these two strategies, exploiting the benefits of both. We empirically evaluate our proposed techniques on multiple Twitter/X datasets and show that our models perform significantly better than the state-of-the-art with improvements of up to 6% for Acc@161 and 8% for F1 score.
Funder
National Science Foundation
National Collaborative on Gun Violence Research
Massive Data Institute (MDI) at Georgetown University
Publisher
Association for Computing Machinery (ACM)
Reference63 articles.
1. Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms
2. F. Al Zamal, W. Liu, and D. Ruths. 2012. Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In ICWSM.
3. Web-a-where
4. Inferring the location of authors from words in their texts;Berggren M.;arXiv preprint arXiv:1612.06671,2016
5. Geographic reference analysis for geographic document querying