Using a Secure, Continually Updating, Web Source Processing Pipeline to Support the Real-Time Data Synthesis and Analysis of Scientific Literature: Development and Validation Study

Author:

Vaghela UddhavORCID,Rabinowicz SimonORCID,Bratsos ParisORCID,Martin GuyORCID,Fritzilas EpameinondasORCID,Markar SherazORCID,Purkayastha SanjayORCID,Stringer KarlORCID,Singh HarshdeepORCID,Llewellyn CharlieORCID,Dutta DebabrataORCID,Clarke Jonathan MORCID,Howard MatthewORCID,Serban OvidiuORCID,Kinross JamesORCID,

Abstract

Background The scale and quality of the global scientific response to the COVID-19 pandemic have unquestionably saved lives. However, the COVID-19 pandemic has also triggered an unprecedented “infodemic”; the velocity and volume of data production have overwhelmed many key stakeholders such as clinicians and policy makers, as they have been unable to process structured and unstructured data for evidence-based decision making. Solutions that aim to alleviate this data synthesis–related challenge are unable to capture heterogeneous web data in real time for the production of concomitant answers and are not based on the high-quality information in responses to a free-text query. Objective The main objective of this project is to build a generic, real-time, continuously updating curation platform that can support the data synthesis and analysis of a scientific literature framework. Our secondary objective is to validate this platform and the curation methodology for COVID-19–related medical literature by expanding the COVID-19 Open Research Dataset via the addition of new, unstructured data. Methods To create an infrastructure that addresses our objectives, the PanSurg Collaborative at Imperial College London has developed a unique data pipeline based on a web crawler extraction methodology. This data pipeline uses a novel curation methodology that adopts a human-in-the-loop approach for the characterization of quality, relevance, and key evidence across a range of scientific literature sources. Results REDASA (Realtime Data Synthesis and Analysis) is now one of the world’s largest and most up-to-date sources of COVID-19–related evidence; it consists of 104,000 documents. By capturing curators’ critical appraisal methodologies through the discrete labeling and rating of information, REDASA rapidly developed a foundational, pooled, data science data set of over 1400 articles in under 2 weeks. These articles provide COVID-19–related information and represent around 10% of all papers about COVID-19. Conclusions This data set can act as ground truth for the future implementation of a live, automated systematic review. The three benefits of REDASA’s design are as follows: (1) it adopts a user-friendly, human-in-the-loop methodology by embedding an efficient, user-friendly curation platform into a natural language processing search engine; (2) it provides a curated data set in the JavaScript Object Notation format for experienced academic reviewers’ critical appraisal choices and decision-making methodologies; and (3) due to the wide scope and depth of its web crawling method, REDASA has already captured one of the world’s largest COVID-19–related data corpora for searches and curation.

Publisher

JMIR Publications Inc.

Subject

Health Informatics

Reference23 articles.

1. LitCovidNational Center for Biotechnology Information2021-04-12https://www.ncbi.nlm.nih.gov/research/coronavirus/

2. COVID-19 research update: How many pandemic papers have been published?Nature Index2020-10-01https://www.natureindex.com/news-blog/how-coronavirus-is-changing-research-practices-and-publishing

3. COVID-19 rapid guideline: critical care in adultsNational Institute for Health and Care Excellence2020-10-01https://www.nice.org.uk/guidance/ng159

4. Interim process and methods for developing rapid guidelines on COVID-19National Institute for Health and Care Excellence2020-10-01https://www.nice.org.uk/process/pmg35/chapter/scoping

5. Countering Missinformation about COVID-19World Health Organization2020-10-01https://www.who.int/news-room/feature-stories/detail/countering-misinformation-about-covid-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3