BACKGROUND
Breast cancer is one of the most common malignant tumors in women, severely threatening the health of women worldwide. Designing an effective data management and processing system to help collect, manage, and use variables for breast cancer diagnosis and treatment has become an urgent need. As an important part of artificial intelligence, a knowledge graph provides an ideal means to solve this problem.
OBJECTIVE
Our study intends to utilize the natural language processing (NLP) technique on Chinese breast cancer mammography reports to effectively identify and extract the features related to breast cancer and construct a knowledge graph for breast cancer diagnosis.
METHODS
This paper focuses on the knowledge graph frame structure and feature extraction that were the main challenges for constructing a Chinese breast cancer diagnosis knowledge graph. Based on mammography examination guidelines and specifications, as well as clinical experiences and recommendations of experts in the hospital, we define entities, entity attributes, and entity relationships for constructing the concept layer of a knowledge graph. From mammography examination reports, we extract mammographic features using deep learning models, with which we build a knowledge graph for breast cancer diagnosis.
RESULTS
When annotating mammography examination reports in NLP tasks, we have identified 15 important types of mammographic features. To improve the versatility of the constructed knowledge graph, we have added additional 7 types of mammographic features. Mammographic features are extracted from a total of 1171 mammography examination reports. For the overall results of the model, the recognition accuracy rate is 98.97%, the accuracy rate is 97.16%, the recall rate is 98.06%, and F1 is 97.61. Based on the structure of the concept layer of the knowledge graph, we import the demographic risk factors and mammographic features extracted from the text reports into the Neo4j graph database to complete the construction of the knowledge graph.
CONCLUSIONS
We constructed a Breast Cancer Diagnosis Knowledge Graph Based on Chinese Electronic Medical Records. Through the evaluation of the design of the concept layer, the construction of the data layer, and the functions of the application layer, the rationality, effectiveness, and practicability of the knowledge graph are demonstrated. This study provides a reference for the rapid design and construction of knowledge graph for other disease diagnosis and treatment.