Affiliation:
1. Laboratoire Langues, Textes, Traitements informatiques, Cognition UMR 8094
2. Laboratoire d'Informatique Gaspard-Monge UMR 8049
Abstract
The way in which authors express themselves is unique but changes over their lifetime. However, quantitative studies of this idiolectal evolution are rare. Using the Corpus for Idiolectal Research (CIDRE) that contains the dated works of 11 prolific 19th century French fiction writers, we propose new methods to identify, quantify and describe the grammatical-stylistic changes that take place using lexico-morphosyntactic patterns, also called motifs. To examine the strength of the chronological signal of change, we developed a method to calculate if a distance matrix of literary works contains a stronger chronological signal than expected by chance. Ten out of 11 corpora showed a higher than chance chronological signal, leading us to conclude that the evolution of the idiolect is in a mathematical sense monotonic, supporting the rectilinearity hypothesis previously put forward in the stylometric literature. The rectilinear property of the evolution of the idiolect found for most authors in CIDRE subsequently enabled us to propose a machine learning task: predicting the year in which a work was written. For the majority of the authors in our corpus, the accuracy and the amount of variance that is explained by the model were high and we discuss why the technique might fail for others. After applying a feature selection algorithm, we examined the most important features, i.e. the motifs that have the greatest influence on idiolectal evolution. We find that some of those features are stylistic and have been previously identified in qualitative literature studies. We report some remarkable stylistic constructions revealed by our algorithm to illustrate which kind of stylistic patterns can be extracted using our method.
Publisher
CA: Journal of Cultural Analytics
Subject
Literature and Literary Theory,Arts and Humanities (miscellaneous),History,Computer Science (miscellaneous)
Reference69 articles.
1. Grammaticalization and the linguistic individual: New avenues in lifespan research;Lynn Anthonissen;Linguistics Vanguard,2019
2. Language chunking, data sparseness, and the value of a long marker list: Explorations with word n-grams and authorial attribution;A. Antonia;Literary and Linguistic Computing,2013
3. Rémanence des Et de relance en français moderne et contemporain: du “résidu” au “reliquat”;Claire Badiou-Monferrand;Le français moderne,2020
4. Individual usage: a corpus-based study of idiolects;Michael Barlow;Proceedings of LAUD Conference,2010
5. A set of postulates for phonemic analysis;Bernard Bloch;Language,1948
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献