HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks-Reference-Cited by-同舟云学术

HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

Published:2021-05-22 Issue: Volume:15 Page:933-942
ISSN:2334-0770
Container-title:Proceedings of the International AAAI Conference on Web and Social Media
language:
Short-container-title:ICWSM

Author:

Alam Firoj,Qazi Umair,Imran Muhammad,Ofli Ferda

Abstract

Social networks are widely used for information consumption and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large volume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making. To address such issues automatic classification systems have been developed using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with ~77K human-labeled tweets, sampled from a pool of ~24 million tweets across 19 disaster events that happened between 2016 and 2019. Moreover, we propose a data collection and sampling pipeline, which is important for social media data sampling for human annotation. We report multiclass classification results using classic and deep learning (fastText and transformer) based models to set the ground for future studies. The dataset and associated resources are publicly available at https://crisisnlp.qcri.org/humaid_dataset.html.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multimodal Social Sensing for the Spatio-Temporal Evolution and Assessment of Nature Disasters;Sensors;2024-09-11

2. ADSumm: annotated ground-truth summary datasets for disaster tweet summarization;Social Network Analysis and Mining;2024-08-05

3. IKDSumm: Incorporating key-phrases into BERT for extractive disaster tweet summarization;Computer Speech & Language;2024-08

4. BenchIMP: A Benchmark for Quantitative Evaluation of the Incident Management Process Assessment;Proceedings of the 19th International Conference on Availability, Reliability and Security;2024-07-30

5. The Effect of Training Data Size on Disaster Classification from Twitter;Information;2024-07-08