Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine

Author:

Cohen Aaron M1,Smalheiser Neil R2,McDonagh Marian S1,Yu Clement3,Adams Clive E4,Davis John M2,Yu Philip S3

Affiliation:

1. Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA

2. Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA

3. Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60612 USA

4. Division of Psychiatry, University of Nottingham, Nottingham, UK

Abstract

ABSTRACT Objective: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT. Materials and Methods: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article. Results: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well. Discussion: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified. Conclusion: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

Funder

National Institutes of Health/National Library of Medicine

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Reference44 articles.

1. Evidence based medicine: what it is and what it isn’t;Sackett;BMJ.,1996

2. What kind of evidence is it that evidence-based medicine advocates want health care providers and consumers to pay attention to?;Haynes;BMC Health Serv Res.,2002

3. Understanding why evidence from randomised clinical trials may not be retrieved from Medline: comparison of indexed and non-indexed records;Wieland;BMJ.,2012

4. A large-scale analysis of the reasons given for excluding articles that are retrieved by literature search during systematic review;Edinger;AMIA Annu Symp Proc.,2013

5. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools;Cohen;Proceedings of the 1st ACM International Health Informatics Symposium November, 2010; Arlington, Virginia USA.,2010

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3