A study on the classification of stylistic and formal features in English based on corpus data testing-Reference-Cited by-同舟云学术

A study on the classification of stylistic and formal features in English based on corpus data testing

Published:2023-04-25 Issue: Volume:9 Page:e1297
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Li Shuhui¹

Affiliation:

1. School of Foreign Studies, South China Agricultural University, Guangzhou, Guangdong, China

Abstract

The traditional statistical and rule combination algorithm lacks the determination of the inner cohesion of words, and the N-gram algorithm does not limit the length of N, which will produce a large number of invalid word strings, consume time and reduce the efficiency of the experiment. Therefore, this article first constructs a Chinese neologism corpus, adopts improved multi-PMI, and sets a double threshold to filter new words. Branch entropy is used to calculate the probabilities between words. Finally, the N-gram algorithm is used to segment the preprocessed corpus. We use multi-word mutual information and a double mutual information threshold to identify new words and improve their recognition accuracy. Experimental results show that the algorithm proposed in this article has been improved in accuracy, recall and F measures value by 7%, 3% and 5% respectively, which can promote the sharing of language information resources so that people can intuitively and accurately obtain language information services from the internet.

Funder

2022 Guangdong Provincial Philosophy and Social Sciences Planning Project

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1297.pdf

Reference17 articles.

1. Language resources and language problems;Chen;Journal of Yunnan Normal University, Philosophy and Social Sciences Edition,2009

2. Information extraction from Chinese plant species diversity description texts;Duan;Modern Library and Information Technology,2016

3. Geliable lexical borrowings in China English on the internet;Fu;Overseas English,2012

4. Impact analysis of adverbs for sentiment classification on Twitter product reviews;Haider;Concurrency and Computation: Practice and Experience,2021