Author:
Gentile Anna Lisa,Zhang Ziqi,Ciravegna Fabio
Abstract
Information extraction (IE) is the technique for transforming unstructured textual data into structured representation that can be understood by machines. The exponential growth of the Web generates an exceptional quantity of data for which automatic knowledge capture is essential. This work describes the methodology for web scale information extraction in the LODIE project (linked open data information extraction) and highlights results from the early experiments carried out in the initial phase of the project. LODIE aims to develop information extraction techniques able to scale at web level and adapt to user information needs. The core idea behind LODIE is the usage of linked open data, a very large-scale information resource, as a ground-breaking solution for IE, which provides invaluable annotated data on a growing number of domains. This article has two objectives. First, describing the LODIE project as a whole and depicting its general challenges and directions. Second, describing some initial steps taken towards the general solution, focusing on a specific IE subtask, wrapper induction.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages;Proceedings of the VLDB Endowment;2023-07
2. Job Type Extraction for Service Businesses;Companion Proceedings of the ACM Web Conference 2023;2023-04-30
3. WebKE;Proceedings of the 30th ACM International Conference on Information & Knowledge Management;2021-10-26
4. Semi-automatic Knowledge Graph Construction by Relation Pattern Extraction;Computación y Sistemas;2019-10-07
5. CERES;Proceedings of the VLDB Endowment;2018-06