Abstract
Abstract
Background
Artificial intelligence can assist in interpreting chest X-ray radiography (CXR) data, but large datasets require efficient image annotation. The purpose of this study is to extract CXR labels from diagnostic reports based on natural language processing, train convolutional neural networks (CNNs), and evaluate the classification performance of CNN using CXR data from multiple centers
Methods
We collected the CXR images and corresponding radiology reports of 74,082 subjects as the training dataset. The linguistic entities and relationships from unstructured radiology reports were extracted by the bidirectional encoder representations from transformers (BERT) model, and a knowledge graph was constructed to represent the association between image labels of abnormal signs and the report text of CXR. Then, a 25-label classification system were built to train and test the CNN models with weakly supervised labeling.
Results
In three external test cohorts of 5,996 symptomatic patients, 2,130 screening examinees, and 1,804 community clinic patients, the mean AUC of identifying 25 abnormal signs by CNN reaches 0.866 ± 0.110, 0.891 ± 0.147, and 0.796 ± 0.157, respectively. In symptomatic patients, CNN shows no significant difference with local radiologists in identifying 21 signs (p > 0.05), but is poorer for 4 signs (p < 0.05). In screening examinees, CNN shows no significant difference for 17 signs (p > 0.05), but is poorer at classifying nodules (p = 0.013). In community clinic patients, CNN shows no significant difference for 12 signs (p > 0.05), but performs better for 6 signs (p < 0.001).
Conclusion
We construct and validate an effective CXR interpretation system based on natural language processing.
Funder
National Natural Science Foundation of China
Shanghai Jiao Tong University
Ministry of Science and Technology of the People’s Republic of China
Publisher
Springer Science and Business Media LLC
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献