Affiliation:
1. Deutsches Forschungszentrum für Künstliche Intelligenz, Kaiserslautern, Deutschland
Abstract
Abstract
Currently, the World Wide Web can be divided into two separate fields. The traditional Web of Documents consisting of hyperlinked web documents and the emerging Web of Data consisting of linked open data. We present ontology-based information extraction as core technology for bridging the gap between both fields. Based on this, we list three basic applications that integrate web data to web documents. Our SCOOBIE system can extract information of a linked open dataset mentioned as textual phrases in web documents. SCOOBIE returns machine interpretable metadata summarizing the content of a web document from the perspective of a linked open dataset. Based on SCOOBIE we present EPIPHANY, a system that returns extracted metadata back to the originating web document in form of semantic annotations. This allows users to request the Web of Data for more information about annotated subjects inside the web document. STERNTALER is a system that analyses extracted metadata from search results of a search engine. It generates semantic filters filled with facets of things that were extracted from web documents inside search results. This allows users filtering those web documents that contain information about specific subjects and facets.