150 years of written Dutch-Reference-Cited by-同舟云学术

150 years of written Dutch

Published:2021-12-01 Issue:3 Volume:26 Page:339-362
ISSN:1384-5845
Container-title:Nederlandse Taalkunde
language:nl
Short-container-title:

Author:

Piersoul Jozefien¹,De Troij Robbert²,Van de Velde Freek¹

Affiliation:

1. KU Leuven

2. KU Leuven & Radboud University

Abstract

Abstract In this article, we present a new corpus spanning 163 years of written Dutch. This Dutch Corpus of Contemporary and late Modern Periodicals (Dutch C-CLAMP) comprises 47,738 part-of-speech tagged articles published in Dutch periodicals from 1837 until 1999, totaling approximately 200 million tokens in size. We explain the measures we took to overcome the shortcomings of existing corpora of historical Dutch covering the same period. We provide a detailed description of how the corpus has been compiled and enriched. Several aspects are covered: text-markup, preprocessing of the data, including foreign language recognition and spelling normalization, and the enrichment of both textual data as well as metadata of the authors of the corpus files. We also carry out two case studies to illustrate the reliability of the corpus.

Publisher

Amsterdam University Press

Subject

General Earth and Planetary Sciences,General Environmental Science

Link

https://www.aup-online.com/content/journals/10.5117/NEDTAA2021.3.002.PIER?crawler=true&mimetype=application/pdf

Reference69 articles.

1. Polyglot: Distributed Word Representations for Multilingual NLP;Proceedings of the Seventeenth Conference on Computational Natural Language Learning,2013

2. Grammaticalization and the linguistic individual: new avenues in lifespan research;Linguistics Vanguard,2019

3. Modeling language change across the lifespan: individual trajectories in community change;Language Variation and Change,2016

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Chapter 8. Resemanticising ‘free’ variation;Studies in Language Companion Series;2023-10-15

2. Men use more complex language than women, but the difference has decreased over time: a study on 120 years of written Dutch;Linguistics;2022-10-19