Mapping Chinese Medical Entities to the Unified Medical Language System

Author:

Chen Luming12,Qi Yifan34,Wu Aiping34,Deng Lizong34,Jiang Taijiao12

Affiliation:

1. Guangzhou Laboratory, Guangzhou, China.

2. Guangzhou Medical University, Guangzhou, China.

3. Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.

4. Suzhou Institute of Systems Medicine, Suzhou, China.

Abstract

Background Chinese medical entities have not been organized comprehensively due to the lack of well-developed terminology systems, which poses a challenge to processing Chinese medical texts for fine-grained medical knowledge representation. To unify Chinese medical terminologies, mapping Chinese medical entities to their English counterparts in the Unified Medical Language System (UMLS) is an efficient solution. However, their mappings have not been investigated sufficiently in former research. In this study, we explore strategies for mapping Chinese medical entities to the UMLS and systematically evaluate the mapping performance. Methods First, Chinese medical entities are translated to English using multiple web-based translation engines. Then, 3 mapping strategies are investigated: (a) string-based, (b) semantic-based, and (c) string and semantic similarity combined. In addition, cross-lingual pretrained language models are applied to map Chinese medical entities to UMLS concepts without translation. All of these strategies are evaluated on the ICD10-CN, Chinese Human Phenotype Ontology (CHPO), and RealWorld datasets. Results The linear combination method based on the SapBERT and term frequency-inverse document frequency bag-of-words models perform the best on all evaluation datasets, with 91.85%, 82.44%, and 78.43% of the top 5 accuracies on the ICD10-CN, CHPO, and RealWorld datasets, respectively. Conclusions In our study, we explore strategies for mapping Chinese medical entities to the UMLS and identify a satisfactory linear combination method. Our investigation will facilitate Chinese medical entity normalization and inspire research that focuses on Chinese medical ontology development.

Funder

the National Key Research and Development Program of China

CAMS Innovation Fund for Medical Sciences

Publisher

American Association for the Advancement of Science (AAAS)

Subject

Multidisciplinary

Reference43 articles.

1. Constructing high-fidelity phenotype knowledge graphs for infectious diseases with a fine-grained semantic information model: Development and usability study;Deng L;J Med Internet Res,2021

2. Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis

3. Cognitive Computing-Based CDSS in Medical Practice

4. MedPortal: A biomedical ontology repository and platform focused on precision medicine;Guo J;Chin J Biomed Eng,2017

5. The UMLS Metathesaurus: Representing different views of biomedical concepts;Schuyler PL;Bull Med Libr Assoc,1993

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3