The Evolution of the Idiolect over the Lifetime: A Quantitative and Qualitative Study of French 19th Century Literature

Author:

Seminck Olga1ORCID,Gambette Philippe2ORCID,Legallois Dominique1,Poibeau Thierry1ORCID

Affiliation:

1. Laboratoire Langues, Textes, Traitements informatiques, Cognition UMR 8094

2. Laboratoire d'Informatique Gaspard-Monge UMR 8049

Abstract

The way in which authors express themselves is unique but changes over their lifetime. However, quantitative studies of this idiolectal evolution are rare. Using the Corpus for Idiolectal Research (CIDRE) that contains the dated works of 11 prolific 19th century French fiction writers, we propose new methods to identify, quantify and describe the grammatical-stylistic changes that take place using lexico-morphosyntactic patterns, also called motifs. To examine the strength of the chronological signal of change, we developed a method to calculate if a distance matrix of literary works contains a stronger chronological signal than expected by chance. Ten out of 11 corpora showed a higher than chance chronological signal, leading us to conclude that the evolution of the idiolect is in a mathematical sense monotonic, supporting the rectilinearity hypothesis previously put forward in the stylometric literature. The rectilinear property of the evolution of the idiolect found for most authors in CIDRE subsequently enabled us to propose a machine learning task: predicting the year in which a work was written. For the majority of the authors in our corpus, the accuracy and the amount of variance that is explained by the model were high and we discuss why the technique might fail for others. After applying a feature selection algorithm, we examined the most important features, i.e. the motifs that have the greatest influence on idiolectal evolution. We find that some of those features are stylistic and have been previously identified in qualitative literature studies. We report some remarkable stylistic constructions revealed by our algorithm to illustrate which kind of stylistic patterns can be extracted using our method.

Publisher

CA: Journal of Cultural Analytics

Subject

Literature and Literary Theory,Arts and Humanities (miscellaneous),History,Computer Science (miscellaneous)

Reference69 articles.

1. Grammaticalization and the linguistic individual: New avenues in lifespan research;Lynn Anthonissen;Linguistics Vanguard,2019

2. Language chunking, data sparseness, and the value of a long marker list: Explorations with word n-grams and authorial attribution;A. Antonia;Literary and Linguistic Computing,2013

3. Rémanence des Et de relance en français moderne et contemporain: du “résidu” au “reliquat”;Claire Badiou-Monferrand;Le français moderne,2020

4. Individual usage: a corpus-based study of idiolects;Michael Barlow;Proceedings of LAUD Conference,2010

5. A set of postulates for phonemic analysis;Bernard Bloch;Language,1948

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3