Author:
ZHANG YAO-ZHONG,MATSUZAKI TAKUYA,TSUJII JUN'ICHI
Abstract
AbstractAs described in this paper, we specifically examine the structural learning problem of a supertagging task. Supertagging is a task to assign the most probable lexical entry to each word in a sentence. A supertagger is extremely important for a lexicalized grammar parser because an accurate supertagger can greatly reduce lexical ambiguity in downstream parser. Supertagging is more challenging than conventional sequence labeling tasks (e.g., part-of-speech tagging). First, the supertags are numerous. Supertags are the lexical entries defined in a lexicalized grammar, which consists of rich syntactic/semantic information. Second, the inter-supertag relation is more complex. A proper supertag assignment is expected to be compatible with other supertag assignments in a sentence to construct a parse tree. Commonly used adjacent label features (e.g., first-order edge feature) in a sequence labeling model are too rough for the supertagging task. Long-range information is extremely important for the supertagging task. Two approaches to consider long-range information in a supertagger's training stage are proposed. Specifically, we propose a dependency-informed supertagger to use word-to-word dependency derived from a dependency parser and generate long-range features as soft constraints in the training. In the forest-guided supertagger, we constrain the classifier to learn in a grammar-satisfying space and use a CFG filter to impose grammar constraints for the update of model parameters. The experiments show that the proposed structure-guided supertaggers perform significantly better than the baseline supertaggers. Based on the improved supertaggers, theF-score of the final parser is also improved. Using the forest-guided supertagger in a shift-reduce HPSG parser, we achieved a competitive parsing performance of 89.31%F-score with higher parsing speed than that of a state-of-the-art HPSG parser.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software
Reference27 articles.
1. Kasami T. 1965. An efficient recognition and syntax analysis algorithm for context-free languages. Technical Report AFCRL-65-758, Air Force Cambridge Research Laboratory, Hanscom Air Force Base, MA, USA.
2. Learning as search optimization
3. The importance of supertagging for wide-coverage CCG parsing