Deep-learning-based automated terminology mapping in OMOP-CDM

Author:

Kang Byungkon1,Yoon Jisang2,Kim Ha Young2,Jo Sung Jin3,Lee Yourim4,Kam Hye Jin5ORCID

Affiliation:

1. Department of Computer Science, State University of New York, Incheon, South Korea

2. Graduate School of Information, Yonsei University, Seoul, South Korea

3. Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang, North Gyeongsang,South Korea

4. RWE Analytics, EvidNet, Seongnam-si, Gyeonggi-do, South Korea

5. Healthcare, Life Solution Cluster, New Business Unit, Hanwha Life, Seoul, South Korea

Abstract

Abstract Objective Accessing medical data from multiple institutions is difficult owing to the interinstitutional diversity of vocabularies. Standardization schemes, such as the common data model, have been proposed as solutions to this problem, but such schemes require expensive human supervision. This study aims to construct a trainable system that can automate the process of semantic interinstitutional code mapping. Materials and Methods To automate mapping between source and target codes, we compute the embedding-based semantic similarity between corresponding descriptive sentences. We also implement a systematic approach for preparing training data for similarity computation. Experimental results are compared to traditional word-based mappings. Results The proposed model is compared against the state-of-the-art automated matching system, which is called Usagi, of the Observational Medical Outcomes Partnership common data model. By incorporating multiple negative training samples per positive sample, our semantic matching method significantly outperforms Usagi. Its matching accuracy is at least 10% greater than that of Usagi, and this trend is consistent across various top-k measurements. Discussion The proposed deep learning-based mapping approach outperforms previous simple word-level matching algorithms because it can account for contextual and semantic information. Additionally, we demonstrate that the manner in which negative training samples are selected significantly affects the overall performance of the system. Conclusion Incorporating the semantics of code descriptions more significantly increases matching accuracy compared to traditional text co-occurrence-based approaches. The negative training sample collection methodology is also an important component of the proposed trainable system that can be adopted in both present and future related systems.

Funder

Ministry of Science and ICT

ICT Consilience Creative Program

Institute for Information & Communications Technology Planning & Evaluation

Korea Institute of Energy Technology Evaluation and Planning

Korean government

Holistic Performance Testing and Evaluation Methods

Field Verifications

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Reference29 articles.

1. SHRINE: enabling nationally scalable multi-site disease Studies;McMurry;PLoS ONE,2013

2. Standardizing clinical diagnoses: evaluating alternate terminology selection;Burrows;AMIA Summits Transl Sci Proc,2020

3. Chapter 4. Medical terminology in the Western world

4. A review of medical terminology standards and structured reporting;Awaysheh;J Vet Diagn Invest,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3