Affiliation:
1. Institute for Computational Linguistics ”A. Zampolli” (ILC-CNR) - ItaliaNLP Lab , Pisa , Italy
Abstract
Abstract
This paper investigates linguistic complexity across natural languages from a corpus-based perspective and relies on the assumptions of linguistic profiling as a methodological framework. We focus in particular on the domain of syntactic complexity and analyze the distribution of a set of features taken as proxies of complexity phenomena at sentence level, which were extracted from 63 treebanks annotated according to the Universal Dependencies formalism. This dataset guarantees that the features considered are modeling the same linguistic phenomena in different treebanks, allowing reliable comparison among languages. We show that our approach is able to identify tendencies of structural proximity between languages not necessarily in line with typologically-supported classification, thus shedding light on new corpus-based findings.
Subject
Linguistics and Language,Language and Linguistics
Reference37 articles.
1. Argamon, Shlomo, Moshe Koppel, Jonathan Fine & Anat Rachel Shimoni. 2003. Gender, genre, and writing style in formal written texts. Text 23(3). 321–346. https://doi.org/10.1515/text.2003.014.
2. Berdicevskis, Aleksandrs, Çağrı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama & Christian Bentz. 2018. Using Universal Dependencies in cross-linguistic complexity research. In Proceedings of the second workshop on universal dependencies (UDW 2018), 8–17. Brussels, Belgium: Association for Computational Linguistics.
3. Bickel, Balthasar. 2015. Distributional typology: Statistical inquiries into the dynamics linguistic diversity. In Bernd Heine & Heiko Narrog (eds.), The oxford handbook linguistic analysis. Oxford: Oxford University Press.
4. Bott, Stefan & Horacio Saggion. 2014. Text simplification resources for Spanish. Language Resources and Evaluation 48(1). 93–120. https://doi.org/10.1007/s10579-014-9265-4.
5. Brunato, Dominique, Andrea Cimino, Felice Dell’Orletta, Giulia Venturi & Simonetta Montemagni. 2020. Profiling-UD: A tool for linguistic profiling of texts. English. In Proceedings of the 12th language resources and evaluation conference, 7145–7151. Marseille, France: European Language Resources Association.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献