Affiliation:
1. Thammasat University, Thailand
Abstract
Social media platforms are critical for disaster communication and relief efforts. Rapid and precise social media post analysis is required for effective disaster response. This article presents a comprehensive study of a framework that combines crowdsourcing and text mining techniques to enhance data extraction from social media. The research focuses on a particular case study of COVID-19 pandemic medical supply request, which shows several key findings. First, the incorporation of domain-specific data during the training of named entity recognition (NER) models is essential for accurately identifying and retrieving important entities, such as the names of medical supplies and hospitals. Second, the implementation of a hybrid system leads to improvement in the extraction of information from social media posts. Finally, the involvement of crowdsourcing is found to be significant in the validation, verification, and filtering of disorganised information within the hybrid system. Our performance analysis demonstrates that the use of hybrid models has the potential to significantly improve the extraction of supply names (by up to 37%) and hospital names (by up to 66%), especially in the absence of a comprehensive vocabulary or specially trained NER models. During the COVID-19 supply shortage in Thailand, volunteers utilised hybrid models to expedite the identification of the necessary information. Experiment results demonstrated significant improvement in the accuracy of extracted data, the ability to acquire relevant information in real-time, the capacity to handle a substantial number of posts and the practical benefit of the proposed framework.