Affiliation:
1. College of Chinese Language and Literature, Zhoukou Normal University , Zhoukou , Henan, , China .
Abstract
Abstract
In this paper, webpage information is extracted by the directed crawling method in data crawling technology so as to obtain a collection of Chinese language literature, which is processed by data cleaning, Chinese word splitting, and de-duplication. Text mining techniques such as machine learning, the LDA model, and semantic networks are used to perform operations such as sentiment analysis, theme extraction, and linguistic association analysis on the acquired text. Based on the mined text, text value assessment and linguistic characterization of literary works written in Chinese are carried out. The text value is quantified and graded using the PMC index model, and the linguistic features of the text, including punctuation, vocabulary, and sentences, are analyzed by constructing a linguistic feature model. Eight of the 10 Chinese literary works selected in this paper, including Alive, have text values at an excellent level. The frequently used punctuation mark in works is the period. The average word length is around 2.75, and the degree of discreteness of sentences is small.
Reference19 articles.
1. Sims, M., Park, J. H., & Bamman, D. (2019, July). Literary event detection. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 3623-3634).
2. MacLeod, N., Shelley, J., & Morrison, A. M. (2018). The touring reader: Understanding the bibliophile’s experience of literary tourism. Tourism Management, 67, 388-398.
3. Lagutina, K., Lagutina, N., Boychuk, E., Vorontsova, I., Shliakhtina, E., Belyaeva, O., ... & Demidov, P. G. (2019, November). A survey on stylometric text features. In 2019 25th Conference of Open Innovations Association (FRUCT) (pp. 184-195). IEEE.
4. Baumard, N., Huillery, E., Hyafil, A., & Safra, L. (2022). The cultural evolution of love in literary history. Nature Human Behaviour, 6(4), 506-522.
5. Worsham, J., & Kalita, J. (2018, August). Genre identification and the compositional effect of genre in literature. In Proceedings of the 27th international conference on computational linguistics (pp. 1963-1973).