The FGLOCTweet Corpus: An English tweet-based corpus for fine-grained location-detection tasks-Reference-Cited by-同舟云学术

The FGLOCTweet Corpus: An English tweet-based corpus for fine-grained location-detection tasks

Published:2022 Issue:1 Volume:10 Page:117-133
ISSN:2243-4712
Container-title:Research in Corpus Linguistics
language:en
Short-container-title:RiCL

Author:

Fernández-Martínez Nicolás José¹^ORCID

Affiliation:

1. Catholic University of Murcia

Abstract

Location detection in social-media microtexts is an important natural language processing task for emergency-based contexts where locative references are identified in text data. Spatial information obtained from texts is essential to understand where an incident happened, where people are in need of help and/or which areas have been affected. This information contributes to raising emergency situation awareness, which is then passed on to emergency responders and competent authorities to act as quickly as possible. Annotated text data are necessary for building and evaluating location-detection systems. The problem is that available corpora of tweets for location-detection tasks are either lacking or, at best, annotated with coarse-grained location types (e.g. cities, towns, countries, some buildings, etc.). To bridge this gap, we present our semi-automatically annotated corpus, the Fine-Grained LOCation Tweet Corpus (FGLOCTweet Corpus), an English tweet-based corpus for fine-grained location-detection tasks, including fine-grained locative references (i.e. geopolitical entities, natural landforms, points of interest and traffic ways) together with their surrounding locative markers (i.e. direction, distance, movement or time). It includes annotated tweet data for training and evaluation purposes, which can be used to advance research in location detection, as well as in the study of the linguistic representation of place or of the microtext genre of social media.

Publisher

Research in Corpus Linguistics

Subject

Ocean Engineering

Reference46 articles.

1. Ahlers, Dirk. 2013. Assessment of the accuracy of GeoNames gazetteer data. In Chris Jones and Ross Purves eds. Proceedings of the 7th Workshop on Geographic Information Retrieval - GIR ’13. New York: Association for Computing Machinery, 74–81.

2. Ahmed, Mohammed F., Lelitha Vanajakshi and Ramasubramanian Suriyanarayanan. 2019. Real-time traffic congestion information from tweets using supervised and unsupervised machine learning techniques. Transportation in Developing Economies 5/2: Article 20. https://link.springer.com/article/10.1007/s40890-019-0088-2 (10 September, 2021.)

3. Anthony, Laurence and Claire Hardaker. 2017. FireAnt (Version 1.1.4). Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software (10 September, 2021.)

4. Baldwin, Timothy, Paul Cook, Marco Lui, Andrew MacKinlay and Li Wang. 2013. How noisy social media text, how diffrnt social media sources? In Ruslan Mitkov and Jong C. Park eds. Proceedings of the Sixth International Joint Conference on Natural Language Processing. Nagoya, Japan: Asian Federation of Natural Language Processing, 356–364. http://www.aclweb.org/anthology/I13-1041 (10 September, 2021.)

5. Chiticariu, Laura, Yunyao Li and Frederick R. Reiss. 2013. Rule-based information extraction is dead! Long live rule-based information extraction systems! In David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu and Steven Bethard eds. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. New York: Association for Computational Linguistics, 827–832.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DLRGeoTweet: A comprehensive social media geocoding corpus featuring fine-grained places;Information Processing & Management;2024-07

2. Applying social media in emergency response: an attention-based bidirectional deep learning system for location reference recognition in disaster tweets;Applied Intelligence;2024-04

3. Geo-knowledge-guided GPT models improve the extraction of location descriptions from disaster-related social media messages;International Journal of Geographical Information Science;2023-10-09

4. IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets;Information Processing & Management;2023-05

5. Role of Geolocation Prediction in Disaster Management;International Handbook of Disaster Research;2023