Author:
Majumder Maimuna S.,Rose Sherri
Abstract
AbstractBackground & ObjectiveDuring infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is usually text-based and rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-based, outbreak-related information and demonstrate its performance across three outbreaks.MethodsAfter developing an algorithm with regular expressions, we automatically curated data from health agencies via three information sources: formal reports, email newsletters, and Twitter. A validation data set was also curated manually for each outbreak.FindingsWhen compared against the validation data sets, the overall cumulative missingness and misidentification of the algorithmically curated data were ≤2% and ≤1%, respectively, for all three outbreaks.ConclusionsWithin the context of outbreak research, our work successfully addresses the need for generalizable tools that can transform text-based information into machine-readable data across varied information sources and infectious diseases.
Publisher
Cold Spring Harbor Laboratory
Reference46 articles.
1. Disease outbreaks by year. The World Health Organization. https://www.who.int/csr/don/archive/year/en/
2. Risk factors for human disease emergence
3. Zoonotic & infectious disease. Center for One Health Research. https://deohs.washington.edu/cohr/zoonotic-infectious-disease
4. Is COVID-19 the first pandemic that evolves into a panzootic?;Vet Ital,2020
5. The Human/Animal Interface: Emergence and Resurgence of Zoonotic Infectious Diseases