Affiliation:
1. University of Geneva
2. University of Edinburgh
Abstract
As in many natural language processing tasks, data-driven models based on supervised learning have become the method of choice for semantic role labeling. These models are guaranteed to perform well when given sufficient amount of labeled training data. Producing this data is costly and time-consuming, however, thus raising the question of whether unsupervised methods offer a viable alternative. The working hypothesis of this article is that semantic roles can be induced without human supervision from a corpus of syntactically parsed sentences based on three linguistic principles: (1) arguments in the same syntactic position (within a specific linking) bear the same semantic role, (2) arguments within a clause bear a unique role, and (3) clusters representing the same semantic role should be more or less lexically and distributionally equivalent. We present a method that implements these principles and formalizes the task as a graph partitioning problem, whereby argument instances of a verb are represented as vertices in a graph whose edges express similarities between these instances. The graph consists of multiple edge layers, each one capturing a different aspect of argument-instance similarity, and we develop extensions of standard clustering algorithms for partitioning such multi-layer graphs. Experiments for English and German demonstrate that our approach is able to induce semantic role clusters that are consistently better than a strong baseline and are competitive with the state of the art.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献