Information extraction from weakly structured radiological reports with natural language queries

Author:

Dada AminORCID,Ufer Tim Leon,Kim Moon,Hasin Max,Spieker Nicola,Forsting Michael,Nensa Felix,Egger Jan,Kleesiek Jens

Abstract

Abstract Objectives Provide physicians and researchers an efficient way to extract information from weakly structured radiology reports with natural language processing (NLP) machine learning models. Methods We evaluate seven different German bidirectional encoder representations from transformers (BERT) models on a dataset of 857,783 unlabeled radiology reports and an annotated reading comprehension dataset in the format of SQuAD 2.0 based on 1223 additional reports. Results Continued pre-training of a BERT model on the radiology dataset and a medical online encyclopedia resulted in the most accurate model with an F1-score of 83.97% and an exact match score of 71.63% for answerable questions and 96.01% accuracy in detecting unanswerable questions. Fine-tuning a non-medical model without further pre-training led to the lowest-performing model. The final model proved stable against variation in the formulations of questions and in dealing with questions on topics excluded from the training set. Conclusions General domain BERT models further pre-trained on radiological data achieve high accuracy in answering questions on radiology reports. We propose to integrate our approach into the workflow of medical practitioners and researchers to extract information from radiology reports. Clinical relevance statement By reducing the need for manual searches of radiology reports, radiologists’ resources are freed up, which indirectly benefits patients. Key Points • BERT models pre-trained on general domain datasets and radiology reports achieve high accuracy (83.97% F1-score) on question-answering for radiology reports. • The best performing model achieves an F1-score of 83.97% for answerable questions and 96.01% accuracy for questions without an answer. • Additional radiology-specific pretraining of all investigated BERT models improves their performance. Graphical Abstract

Funder

Universitätsklinikum Essen

Publisher

Springer Science and Business Media LLC

Subject

Radiology, Nuclear Medicine and imaging,General Medicine

Reference21 articles.

1. Hahn U, Oleynik M (2020) Medical information extraction in the age of deep learning. Yearb Med Inform 29:208–220. https://doi.org/10.1055/s-0040-1702001

2. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186

3. Bressem KK, Adams LC, Gaudin RA et al (2021) Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics 36:5255–5261. https://doi.org/10.1093/bioinformatics/btaa668

4. Datta S, Ulinski M, Godfrey-Stovall J, et al (2020) Rad-SpatialNet: a frame-based resource for fine-grained spatial relations in radiology reports. In: Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 2251–2260

5. Wen A, Elwazir MY, Moon S, Fan J (2020) Adapting and evaluating a deep learning language model for clinical why-question answering. JAMIA Open 3:16–20. https://doi.org/10.1093/jamiaopen/ooz072

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3