Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble-Reference-Cited by-同舟云学术

Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble

Published:2021-11 Issue:S9 Volume:21 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Chen Yani,Nan Shan,Tian Qi,Cai Hailing,Duan Huilong,Lu Xudong^ORCID

Abstract

Abstract Background Standardized coding of plays an important role in radiology reports’ secondary use such as data analytics, data-driven decision support, and personalized medicine. RadLex, a standard radiological lexicon, can reduce subjective variability and improve clarity in radiology reports. RadLex coding of radiology reports is widely used in many countries, but translation and localization of RadLex in China are far from being established. Although automatic RadLex coding is a common way for non-standard radiology reports, the high-accuracy cross-language RadLex coding is hardly achieved due to the limitation of up-to-date auto-translation and text similarity algorithms and still requires further research. Methods We present an effective approach that combines a hybrid translation and a Multilayer Perceptron weighting text similarity ensemble algorithm for automatic RadLex coding of Chinese structured radiology reports. Firstly, a hybrid way to integrate Google neural machine translation and dictionary translation helps to optimize the translation of Chinese radiology phrases to English. The dictionary is made up of 21,863 Chinese–English radiological term pairs extracted from several free medical dictionaries. Secondly, four typical text similarity algorithms are introduced, which are Levenshtein distance, Jaccard similarity coefficient, Word2vec Continuous bag-of-words model, and WordNet Wup similarity algorithms. Lastly, the Multilayer Perceptron model has been used to synthesize the contextual, lexical, character and syntactical information of four text similarity algorithms to promote precision, in which four similarity scores of two terms are taken as input and the output presents whether the two terms are synonyms. Results The results show the effectiveness of the approach with an F1-score of 90.15%, a precision of 91.78% and a recall of 88.59%. The hybrid translation algorithm has no negative effect on the final coding, F1-score has increased by 21.44% and 8.12% compared with the GNMT algorithm and dictionary translation. Compared with the single similarity, the result of the MLP weighting similarity algorithm is satisfactory that has a 4.48% increase compared with the best single similarity algorithm, WordNet Wup. Conclusions The paper proposed an innovative automatic cross-language RadLex coding approach to solve the standardization of Chinese structured radiology reports, that can be taken as a reference to automatic cross-language coding.

Funder

National Key Research and development Program of China

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

Link

https://link.springer.com/content/pdf/10.1186/s12911-021-01604-9.pdf

Reference35 articles.

1. Ganeshan D, Duong P-AT, Probyn L, Lenchik L, McArthur TA, Retrouvey M, Ghobadi EH, Desouches SL, Pastel D, Francis IR. Structured reporting in radiology. Acad Radiol. 2018;25(1):66–73.

2. Cramer JA, Eisenmenger LB, Pierson NS, Dhatt HS, Heilbrun ME. Structured and templated reporting: an overview. Appl Radiol. 2014;43(8):18–21.

3. Langlotz CP. RadLex: a new method for indexing online educational materials. Radiol Soc North Am. 2006;26:1595–7.

4. Stanfill MH, Williams M, Fenton SH, Jenders RA, Hersh WR. A systematic literature review of automated clinical coding and classification systems. J Am Med Inform Assoc. 2010;17(6):646–51.

5. Pereira S, Névéol A, Massari P, Joubert M, Darmoni S. Construction of a semi-automated ICD-10 coding help system to optimize medical and economic coding. Stud Health Technol Inform. 2006;124:845–50.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predicting abiotic stress-responsive miRNA in plants based on multi-source features fusion and graph neural network;Plant Methods;2024-02-24

2. Text Similarity Calculation Method Based on Optimized Cosine Distance;2022 International Conference on Electronics and Devices, Computational Science (ICEDCS);2022-09

3. A Novel Discrimination Structure for Assessing Text Semantic Similarity;網際網路技術學刊;2022-07