Modeling Long-Range Dynamic Correlations of Words in Written Texts with Hawkes Processes-Reference-Cited by-同舟云学术

Modeling Long-Range Dynamic Correlations of Words in Written Texts with Hawkes Processes

Published:2022-06-22 Issue:7 Volume:24 Page:858
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Ogura Hiroshi,Hanada Yasutaka,Amano Hiromi,Kondo Masato

Abstract

It has been clarified that words in written texts are classified into two groups called Type-I and Type-II words. The Type-I words are words that exhibit long-range dynamic correlations in written texts while the Type-II words do not show any type of dynamic correlations. Although the stochastic process of yielding Type-II words has been clarified to be a superposition of Poisson point processes with various intensities, there is no definitive model for Type-I words. In this study, we introduce a Hawkes process, which is known as a kind of self-exciting point process, as a candidate for the stochastic process that governs yielding Type-I words; i.e., the purpose of this study is to establish that the Hawkes process is useful to model occurrence patterns of Type-I words in real written texts. The relation between the Hawkes process and an existing model for Type-I words, in which hierarchical structures of written texts are considered to play a central role in yielding dynamic correlations, will also be discussed.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/7/858/pdf

Reference40 articles.

1. Time‐Series analysis in linguistics: Application of the ARIMA method to cases of spoken Polish

2. Language in the Line vs. Language in the Mass: On the Efficiency of Sequential Modelling in the Analysis of Rhythm

3. Modelling of Sequential Structures in Text;Pawlowski,2005

4. Sequential Structures in “Dalimil’s Chronicle”

5. Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words