Abstract
Abstract
An approach for Ontology based Information Extraction (OBIE) from unstructured text in the Bulgarian language is presented in this paper. The presented method and algorithm provide a solution for automatic data extraction from text documents exploiting ontologies. To this end, in addition to the standard tools for processing language resources in an open source free software, a dictionary-based lemmatizer for Bulgarian has been developed and integrated. It is distributed as free software, publicly available to download and use under the GPL v3 license. Due to the specifics of inflection in Bulgarian the developed tools for lemmatization will contribute to improving the results of the POS tagger. This approach will offer opportunities for developing a dynamically created gazetteer that is, in combination with a few other generic GATE resources, capable of producing ontologybased annotations over the given content with regards to the given ontology. This algorithm can also be used in the processes of content creation and management of information and knowledge.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献