Abstract
AbstractMedical free-text records store a lot of useful information that can be exploited in developing computer-supported medicine. However, extracting the knowledge from the unstructured text is difficult and depends on the language. In the paper, we apply Natural Language Processing methods to process raw medical texts in Polish and propose a new methodology for clustering of patients’ visits. We (1) extract medical terminology from a corpus of free-text clinical records, (2) annotate data with medical concepts, (3) compute vector representations of medical concepts and validate them on the proposed term analogy tasks, (4) compute visit representations as vectors, (5) introduce a new method for clustering of patients’ visits and (6) apply the method to a corpus of 100,000 visits. We use several approaches to visual exploration that facilitate interpretation of segments. With our method, we obtain stable and separated segments of visits which are positively validated against final medical diagnoses. In this paper we show how algorithm for segmentation of medical free-text records may be used to aid medical doctors. In addition to this, we share implementation of described methods with examples as open-source package .
Funder
Narodowe Centrum Badań i Rozwoju
narodowe centrum badań i rozwoju
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Computer Networks and Communications,Hardware and Architecture,Information Systems,Software
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献