Affiliation:
1. Uppsala University Department of Linguistics and Philology, RISE Research Institutes of Sweden. joakim.nivre@lingfil.uu.se
2. Linköping University, Department of Computer and Information Science. ali.basirat@liu.se
3. Uppsala University, Department of Linguistics and Philology RISE Research Institutes of Sweden. luise.durlich@ri.se
4. Uppsala University, Department of Linguistics and Philology. adam.moss@lingfil.uu.se
Abstract
Abstract
Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Reference48 articles.
1. Parsing by chunks;Abney,1991
2. Getting to the roots of dependency
parsing;Ballesteros,2013
3. Dependency-based hybrid model of syntactic
analysis for the languages with a rather free word order;Bārzdiņš,2007
4. Syntactic nuclei in dependency
parsing—a multilingual exploration;Basirat,2021
5. Fitting linear mixed-effects models using
lme4;Bates;Journal of Statistical Software,2015
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献