BACKGROUND
Dictionary based named-entity recognition (NER) with standardized terminology in radiology reports has the advantage of expressing the association relationships between extracted compounds. However, it is not as accurate as the methods that implement machine learning.
OBJECTIVE
To improve the accuracy of terminology extraction in NER, we attempt to expand the terminology dictionary using Ontology RadLex, which is a representative standardized terminology in the field of radiology. While grasping the trend of the words appearing in radiology reports, terminologies that could not be recognized by RadLex were added to the dictionary of analysis tools, and further study was conducted on the accuracies of these terms.
METHODS
In this study, 163,201 items of findings and impressions in MIMIC-III were used to extract words for extending dictionaries using Word2Vec. The parameters of Word2Vec for lexicon expansion to obtain the most appropriate similar words are discussed in this paper.
RESULTS
The best synonym is obtained when the epoch number is 7 in the hierarchical softmax based skip-gram algorithm.
CONCLUSIONS
Using these parameters, we can construct a model, input modifiers of compound words, and append compound words to the dictionary according to the order of output cosine values.