Affiliation:
1. Instituto Federal de Goiás
2. Brazilian Research Lab
3. IBM Research
4. PUC-Rio
Abstract
We describe a structure learning system for unrestricted coreference resolution that explores two key modeling techniques: latent coreference trees and automatic entropy-guided feature induction. The latent tree modeling makes the learning problem computationally feasible because it incorporates a meaningful hidden structure. Additionally, using an automatic feature induction method, we can efficiently build enhanced nonlinear models using linear model learning algorithms. We present empirical results that highlight the contribution of each modeling technique used in the proposed system. Empirical evaluation is performed on the multilingual unrestricted coreference CoNLL-2012 Shared Task datasets, which comprise three languages: Arabic, Chinese and English. We apply the same system to all languages, except for minor adaptations to some language-dependent features such as nested mentions and specific static pronoun lists. A previous version of this system was submitted to the CoNLL-2012 Shared Task closed track, achieving an official score of 58.69, the best among the competitors. The unique enhancement added to the current system version is the inclusion of candidate arcs linking nested mentions for the Chinese language. By including such arcs, the score increases by almost 4.5 points for that language. The current system shows a score of 60.15, which corresponds to a 3.5% error reduction, and is the best performing system for each of the three languages.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Reference59 articles.
1. Bagga, Amit and Breck Baldwin. 1998. Algorithms for scoring coreference chains. In Proceedings of the First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563–566, Granada.
2. Bansal, Mohit and Dan Klein. 2012. Coreference semantics from web Features. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 389–398, Jeju Island.
3. Understanding the value of features for coreference resolution
4. Bergsma, Shane and Dekang Lin. 2006. Bootstrapping path-based pronoun resolution. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 33–40, Sydney.
5. Björkelund, Anders and Richárd Farkas. 2012. Data-driven multilingual coreference resolution using resolver stacking. In Proceedings of the Sixteenth Conference on Computational Natural Language Learning: Shared Task, pages 49–55, Jeju Island.
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献