A Systematic Approach to Configuring MetaMap for Optimal Performance

Author:

Jing Xia1,Indani Akash2,Hubig Nina2,Min Hua3,Gong Yang4,Cimino James J.5,Sittig Dean F.4,Rennert Lior1,Robinson David6,Biondich Paul7,Wright Adam8,Nøhr Christian9,Law Timothy10,Faxvaag Arild11,Gimbel Ronald1

Affiliation:

1. Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, South Carolina, United States

2. School of Computing, College of Engineering, Computing and Applied Sciences, Clemson University, Clemson, South Carolina, United States

3. Department of Health Administration and Policy, College of Health and Human Services, George Mason University, Fairfax, Virginia, United States

4. School of Biomedical Informatics, The University of Texas Health Sciences Center at Houston, Houston, Texas, United States

5. Informatics Institute, The University of Alabama at Birmingham, Birmingham, Alabama, United States

6. Independent Consultant, Cumbria, United kingdom

7. Department of Pediatrics, Clem McDonald Biomedical Informatics Center, Regenstrief Institute, Indiana University School of Medicine, Indianapolis, Indiana, United States

8. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States

9. Department of Planning, Faculty of Engineering, Aalborg University, Aalborg, Denmark

10. Ohio Musculoskeletal and Neurologic Institute, Ohio University, Athens, Ohio, United States

11. Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway

Abstract

Abstract Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure (β =1) were calculated. Results The percentages of exact matches and missing gold-standard terms were 0.6–0.79 and 0.09–0.3 for one behavior option, and 0.56–0.8 and 0.09–0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively. Conclusion We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap.

Funder

National Institute of General Medical Sciences of the National Institutes of Health

Publisher

Georg Thieme Verlag KG

Subject

Health Information Management,Advanced and Specialized Nursing,Health Informatics

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3