Author:
Gu Jinghang,Xiang Rong,Wang Xing,Li Jing,Li Wenjie,Qian Longhua,Zhou Guodong,Huang Chu-Ren
Abstract
AbstractBackgroundThe COVID-19 pandemic has increasingly accelerated the publication pace of scientific literature. How to efficiently curate and index this large amount of biomedical literature under the current crisis is of great importance. Previous literature indexing is mainly performed by human experts using Medical Subject Headings (MeSH), which is labor-intensive and time-consuming. Therefore, to alleviate the expensive time consumption and monetary cost, there is an urgent need for automatic semantic indexing technologies for the emerging COVID-19 domain.ResultsIn this research, to investigate the semantic indexing problem for COVID-19, we first construct the new COVID-19 Semantic Indexing dataset, which consists of more than 80 thousand biomedical articles. We then propose a novel semantic indexing framework based on the multi-probe attention neural network (MPANN) to address the COVID-19 semantic indexing problem. Specifically, we employ a k-nearest neighbour based MeSH masking approach to generate candidate topic terms for each input article. We encode and feed the selected candidate terms as well as other contextual information as probes into the downstream attention-based neural network. Each semantic probe carries specific aspects of biomedical knowledge and provides informatively discriminative features for the input article. After extracting the semantic features at both term-level and document-level through the attention-based neural network, MPANN adopts a linear multi-view classifier to conduct the final topic prediction for COVID-19 semantic indexing.ConclusionThe experimental results suggest that MPANN promises to represent the semantic features of biomedical texts and is effective in predicting semantic topics for COVID-19 related biomedical articles.
Funder
Hong Kong Polytechnic University
National Outstanding Youth Science Fund Project of National Natural Science Foundation of China
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference41 articles.
1. Wang LL, Lo K, Chandrasekhar Y, et al. CORD-19: The Covid-19 Open Research Dataset. ArXiv preprint. 2020; http://arxiv.org/abs/2004.10706v2.
2. Esteva A, Anuprit K, Romain P, et al. Co-search: Covid-19 information retrieval with semantic search, question answering, and abstractive summarization. ArXiv preprint. 2020; http://arxiv.org/abs/2006.09595.
3. Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res. 2021;49(D1):D1534–40.
4. Yuki K, Fujiogi M, Koutsogiannaki S. COVID-19 pathophysiology: A review. Clin Immunol. 2020. https://doi.org/10.1016/j.clim.2020.108427.
5. Betsch C. How behavioural science data helps mitigate the COVID-19 crisis. Nat Hum Behav. 2020;4(5):438.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献