Affiliation:
1. University of Haifa, Haifa, Israel
Abstract
We consider the task of record extraction from text documents, where the goal is to automatically populate the fields of target relations, such as scientific seminars or corporate acquisition events. There are various inferences involved in the record-extraction process, including mention detection, unification, and field assignments. We use structured learning to find the appropriate field-value assignments. Unlike previous works, the proposed approach generates feature-rich models that enable the modeling of domain semantics and structural coherence at all levels and across fields. Given labeled examples, such an approach can, for instance, learn likely event durations and the fact that start times should come before end times. While the inference space is large, effective learning is achieved using a perceptron-style method and simple, greedy beam decoding. A main focus of this article is on practical aspects involved in implementing the proposed framework for real-world applications. We argue and demonstrate that this approach is favorable in conditions of data shift, a real-world setting in which models learned using a limited set of labeled examples are applied to examples drawn from a different data distribution. Much of the framework’s robustness is attributed to the modeling of domain knowledge. We describe design and implementation details for the case study of seminar event extraction from email announcements, and discuss design adaptations across different domains and text genres.
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献