Składnica: a constituency treebank of Polish harmonised with the Walenty valency dictionary

Author:

Woliński MarcinORCID,Hajnicz ElżbietaORCID

Abstract

AbstractThis paper reports on the developments in three interrelated linguistic resources for Polish. The first is Świgra 2—a rule based constituency parser for Polish. The second is Składnica—a treebank built using Świgra 2. The third resource is valency dictionary Walenty, which became available when the work on the first two was already advanced. However, since the dictionary is much more comprehensive than the ad-hoc dictionary used previously with Świgra, a decision was made to switch the parser and the treebank to the new dictionary. The switch required several modifications to the Świgra 2 parser, including implementation of unlike coordination, introducing semantically motivated phrases, and non-standard case values. A semi-automated procedure to upgrade previously disambiguated trees in Składnica was required as well. Modifications introduced in the treebank during the upgrade included systematic changes of notation and resolving newly introduced ambiguities resulting from the use of the more detailed distinctions made in the dictionary. The procedure for confronting Składnica with the trees generated with the new version of the Świgra 2 parser using the Walenty dictionary allowed us to check all of these resources for consistency. This resulted in several corrections being introduced in both the treebank and the valency dictionary.

Funder

Ministerstwo Nauki i Szkolnictwa Wyzszego

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Linguistics and Language,Education,Language and Linguistics

Reference36 articles.

1. Böhmová, A., Hajičová, E., Hajič, J., & Hladká, B. (2003). The Prague dependency treebank: A three-level annotation scenario. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora, language and speech (pp. 103–127). Dordrecht: Kluwer Academic Publishers.

2. Fillmore, C. J., Johnson, C. R., & Petruck, M. R. L. (2003). Background to FrameNet. International Journal of Lexicography, 16(3), 235–250.

3. Hajič, J. (2005). Complex corpus annotation: The Prague dependency treebank. In M. Šimková (Ed.), Insight into Slovak and Czech Corpus Linguistics (pp. 54–73). Bratislava: Veda.

4. Hajnicz, E., Andrzejczuk, A., & Bartosiak, T. (2016a). Semantic layer of the valence dictionary of Polish Walenty. In N. Calzolari, K. Choukri, T. Declerck, M. Grobelnik, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016, ELRA (pp. 2625–2632). Portorož: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/index.html.

5. Hajnicz, E., Patejuk, A., Przepiórkowski, A., & Woliński, M. (2016b). Walenty: słownik walencyjny j zyka polskiego z bogatym komponentem frazeologicznym. In K. Skwarska & E. Kaczmarska (Eds.), Výzkum slovesné valence ve slovanských zemích (pp. 71–102). Prague: Slovanský ústav AV ČR.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. ICON: A Linguistically-Motivated Large-Scale Benchmark Indonesian Constituency Treebank;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-08-23

2. Constituency Parsing with Spines and Attachments;Computational Science – ICCS 2023;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3