Automatic symptom name normalization in clinical records of traditional Chinese medicine

Author:

Wang Yaqiang,Yu Zhonghua,Jiang Yongguang,Xu Kaikuo,Chen Xia

Abstract

Abstract Background In recent years, Data Mining technology has been applied more than ever before in the field of traditional Chinese medicine (TCM) to discover regularities from the experience accumulated in the past thousands of years in China. Electronic medical records (or clinical records) of TCM, containing larger amount of information than well-structured data of prescriptions extracted manually from TCM literature such as information related to medical treatment process, could be an important source for discovering valuable regularities of TCM. However, they are collected by TCM doctors on a day to day basis without the support of authoritative editorial board, and owing to different experience and background of TCM doctors, the same concept might be described in several different terms. Therefore, clinical records of TCM cannot be used directly to Data Mining and Knowledge Discovery. This paper focuses its attention on the phenomena of "one symptom with different names" and investigates a series of metrics for automatically normalizing symptom names in clinical records of TCM. Results A series of extensive experiments were performed to validate the metrics proposed, and they have shown that the hybrid similarity metrics integrating literal similarity and remedy-based similarity are more accurate than the others which are based on literal similarity or remedy-based similarity alone, and the highest F-Measure (65.62%) of all the metrics is achieved by hybrid similarity metric VSM+TFIDF+SWD. Conclusions Automatic symptom name normalization is an essential task for discovering knowledge from clinical data of TCM. The problem is introduced for the first time by this paper. The results have verified that the investigated metrics are reasonable and accurate, and the hybrid similarity metrics are much better than the metrics based on literal similarity or remedy-based similarity alone.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Reference18 articles.

1. Yi F, Zhaohui W, Xuezhong Z, Zhongmei Z, Weiyu F: Knowledge discovery in Traditional Chinese Medicine: State of the art and perspectives. Artif Intell Med 2006, 38: 219–236. 10.1016/j.artmed.2006.07.005

2. Li C, Tang C, Zeng C, Wu J, Chen Y, Qiu J, Zhu J, Dai L, Jiang Y: Discovering Multi-dimensional Major Medicines from Traditional Chinese Medicine Prescriptions. Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics 2008, 260–264. full_text

3. Chuan L, Changjie T, Zhonghua Y, Yintian L, Tianqing Z, Qihong L, Mingfang Z, Yongguang J: Mining Multi-dimensional Frequent Patterns Without Data Cube Construction. Proceedings of ninth Pacific Rim International Conference on Artificial Intelligence 2006, 251–260.

4. William WC, Pradeep R, Stephen EF: A Comparison of String Distance Metrics for Name-Matching Tasks. Proceedings of the IJCAL-2003 Workshop on Information Integration on the Web 2003, 73–78.

5. An Introduction To Jaro-Winkler Distance[http://en.wikipedia.org/wiki/Jaro-Winkler_distance]

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3