Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks-Reference-Cited by-同舟云学术

Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks

Published:2005-09 Issue:3 Volume:31 Page:329-366
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

O'Donovan Ruth¹,Burke Michael¹,Cahill Aoife¹,Genabith Josef van¹,Way Andy¹

Affiliation:

1. Dublin City University

Abstract

We present a methodology for extracting subcategorization frames based on an automatic lexical-functional grammar (LFG) f-structure annotation algorithm for the Penn-II and Penn-III Treebanks. We extract syntactic-function-based subcategorization frames (LFG semantic forms) and traditional CFG category-based subcategorization frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. In contrast to many other approaches, ours does not predefine the subcategorization frame types extracted, learning them instead from the source data. Including particles and prepositions, we extract 21,005 lemma frame types for 4,362 verb lemmas, with a total of 577 frame types and an average of 4.8 frame types per verb. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource. To our knowledge, this is the largest and most complete evaluation of subcategorization frames acquired automatically for English.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/089120105774321073

Reference5 articles.

1. On the order of words

2. Evaluating Automatic LFG F-Structure Annotation for the Penn-II Treebank

3. Extending the Coverage of a CCG System

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Frequency, acceptability, and selection: A case study of clause-embedding;Glossa: a journal of general linguistics;2020-11-04

2. Introduction: A multifaceted approach to verb classes;Linguistics;2013-01-28

3. Computational Aspects of Lexical Functional Grammar;Language and Linguistics Compass;2011-01

4. Comparison of Chinese Treebanks for Corpus-oriented HPSG Grammar Development;Journal of Natural Language Processing;2010

5. CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank;Computational Linguistics;2007-09