Abstract
ABSTRACTThe COVID-19 pandemic has led to an exponential surge and an enormous amount of published literature, both accurate and inaccurate, a term usually coined as an infodemic. In the context of searching for COVID-19 related scientific literature, we present an information retrieval methodology for effectively finding relevant publications for different information needs. Our multi-stage information retrieval architecture combines probabilistic weighting models and re-ranking algorithms based on neural masked language models. The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top ranking teams participating in the competition. Particularly, the ranking combination of bag-of-words and language models significantly outperformed a BM25-based baseline model (16 percentage points for the NDCG@20 metric), correctly retrieving more than 16 out of the top 20 documents retrieved. The proposed pipeline could thus support the effective search and discovery of relevant information in the case of an infodemic.
Publisher
Cold Spring Harbor Laboratory
Reference41 articles.
1. Haghani, M. , Bliemer, M. C. , Goerlandt, F. & Li, J. The scientific literature on coronaviruses, covid-19 and its associated safety-related research dimensions: A scientometric analysis and scoping review. Saf. Sci. 104806 (2020).
2. How to fight an infodemic;The lancet,2020
3. Wang, L. L. et al. Cord-19: The covid-19 open research dataset. ArXiv (2020).
4. TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19
5. TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection;arXiv e-prints,2020