A hybrid ontology-based information extraction system

Author:

Gutierrez Fernando1,Dou Dejing1,Fickas Stephen1,Wimalasuriya Daya2,Zong Hui3

Affiliation:

1. University of Oregon, USA

2. University of Moratuwa, Sri Lanka

3. University of Virginia, USA

Abstract

Information Extraction is the process of automatically obtaining knowledge from plain text. Because of the ambiguity of written natural language, Information Extraction is a difficult task. Ontology-based Information Extraction (OBIE) reduces this complexity by including contextual information in the form of a domain ontology. The ontology provides guidance to the extraction process by providing concepts and relationships about the domain. However, OBIE systems have not been widely adopted because of the difficulties in deployment and maintenance. The Ontology-based Components for Information Extraction (OBCIE) architecture has been proposed as a form to encourage the adoption of OBIE by promoting reusability through modularity. In this paper, we propose two orthogonal extensions to OBCIE that allow the construction of hybrid OBIE systems with higher extraction accuracy and a new functionality. The first extension utilizes OBCIE modularity to integrate different types of implementation into one extraction system, producing a more accurate extraction. For each concept or relationship in the ontology, we can select the best implementation for extraction, or we can combine both implementations under an ensemble learning schema. The second extension is a novel ontology-based error detection mechanism. Following a heuristic approach, we can identify sentences that are logically inconsistent with the domain ontology. Because the implementation strategy for the extraction of a concept is independent of the functionality of the extraction, we can design a hybrid OBIE system with concepts utilizing different implementation strategies for extracting correct or incorrect sentences. Our evaluation shows that, in the implementation extension, our proposed method is more accurate in terms of correctness and completeness of the extraction. Moreover, our error detection method can identify incorrect statements with a high accuracy.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Reference54 articles.

1. Jurafsky D, Martin JH. Speech and language processing: An introduction to natural language processing, 2nd edn. Englewood Cliffs, NJ: Prentice Hall, 2008, p. 725.

2. A translation approach to portable ontology specifications

3. Ontology-based information extraction: An introduction and a survey of current approaches

4. GO for gene documents

5. Using Information Extractors with the Neural ElectroMagnetic Ontologies

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3