BACKGROUND
Atherosclerotic cerebrovascular disease could result in a great number of deaths and disabilities. However, it did not acquire enough attention. Up till now, less information, statistics, or clear consensus on the disease was revealed. Thus, no systematic concept datasets were released to help clinicians in the field to clarify the scope, assist research, and offer maximized value.
OBJECTIVE
The aims of this study were to (1) develop a comprehensive cross-lingual atherosclerotic cerebrovascular disease ontology. (2) describe the workflow, schema, and hierarchical structure, and the highlighted content of the ontology (3) design a brand-new rehabilitation ontology which was an important part overlooked in the existing ontologies (4) implement the evaluation of the proposed ontology (5) apply the proposed ontology to real-world scenarios and electronic health records to realize information retrieval, named entity recognition, novel expression discovery, and knowledge fusion.
METHODS
We implemented 9 steps based on the ontology development 101 methodologies combined with expert opinions. The final ontology included clinical requirements collection and specification, background investigation and knowledge acquisition, ontology selection and reuse, scope identification, schema definition, concept extraction, concept extension, ontology verification, and ontology evaluation.
RESULTS
The current ontology included 10 top-level classes, respectively clinical manifestation, comorbidity, complication, diagnosis, model of atherosclerotic cerebrovascular disease, pathogenesis, prevention, rehabilitation, risk factor, and treatment. Totally, there are 1715 concepts in the 11-level ontology, covering 4588 Chinese terms, 6617 English terms, and 972 definitions. The ontology could be applied in real-world scenarios such as information retrieval, new expression discovery, named entity recognition, and knowledge fusion, and the use case proved that it could offer satisfying support to related medical scenarios.
CONCLUSIONS
The proposed ontology provided a clear set of cross-lingual concepts and terms with an explicit hierarchical structure, helping scientific researchers to quickly retrieve relevant medical literature, assisting data scientists to efficiently identify relevant contents in electronic health records, and providing a clear domain framework for academic reference.