Unlexicalized Transition-based Discontinuous Constituency Parsing

Author:

Coavoux Maximin1,Crabbé Benoît23,Cohen Shay B.4

Affiliation:

1. University of Edinburgh, ILCC, United Kingdom. mcoavoux@inf.ed.ac.uk

2. Université Paris Diderot, Université Sorbonne Paris Cité, LLF, France

3. Institut Universitaire de France (IUF), France. bcrabbe@linguist.univ-paris-diderot.fr

4. University of Edinburgh, ILCC, United Kingdom. scohen@inf.ed.ac.uk

Abstract

Abstract Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head and (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it with lexicalized parsing models in order to address the question of lexicalization in the context of discontinuous constituency parsing. Our experiments show that unlexicalized models systematically achieve higher results than lexicalized models, and provide additional empirical evidence that lexicalization is not necessary to achieve strong parsing results. Our best unlexicalized model sets a new state of the art on English and German discontinuous constituency treebanks. We further provide a per-phenomenon analysis of its errors on discontinuous constituents.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Reference61 articles.

1. Daniel M. Bikel . 2004. A distributional analysis of a lexicalized statistical parsing model. In Proceedings of EMNLP 2004, pages 182–189. Association for Computational Linguistics.

2. Anders Björkelund , OzlemCetinoglu, RichárdFarkas, ThomasMueller, and WolfgangSeeker.2013. (Re)ranking meets morphosyntax: State- of-the-art results from the SPMRL 2013 shared task. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically- Rich Languages, pages 135–145, Seattle, Washington, USA. Association for Computational Linguistics.

3. Léon Bottou . 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), pages 177–187, Paris, France. Springer.

4. Sabine Brants , StefanieDipper, SilviaHansen, WolfgangLezius, and GeorgeSmith.2002. Tiger treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories, September 20-21 (TLT02). Sozopol, Bulgaria.

5. Eugene Charniak . 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence, AAAI’97/IAAI’97, pages 598–603. AAAI Press.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Discontinuous grammar as a foreign language;Neurocomputing;2023-03

2. Discontinuous Combinatory Constituency Parsing;Transactions of the Association for Computational Linguistics;2023

3. Multitask Pointer Network for multi-representational parsing;Knowledge-Based Systems;2022-01

4. A survey of syntactic-semantic parsing based on constituent and dependency structures;Science China Technological Sciences;2020-09-16

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3