Construction of English and American Literature Corpus Based on Machine Learning Algorithm-Reference-Cited by-同舟云学术

Construction of English and American Literature Corpus Based on Machine Learning Algorithm

Published:2022-06-02 Issue: Volume:2022 Page:1-9
ISSN:1687-5273
Container-title:Computational Intelligence and Neuroscience
language:en
Short-container-title:Computational Intelligence and Neuroscience

Author:

Dai Qian¹^ORCID

Affiliation:

1. School of Foreign Languages, Henan Polytechnic University, Jiaozuo 454003, Henan Province, China

Abstract

In China, the application of corpus in language teaching, especially in English and American literature teaching, is still in the preliminary research stage, and there are various shortcomings, which have not been paid due attention by front-line educators. Constructing English and American literature corpus according to certain principles can effectively promote English and American literature teaching. The research of this paper is devoted to how to automatically build a corpus of English and American literature. In the process of keyword extraction, key phrases and keywords are effectively combined. The similarity between atomic events is calculated by the TextRank algorithm, and then the first N sentences with high similarity are selected and sorted. Based on ML (machine learning) text classification method, a combined classifier based on SVM (support vector machine) and NB (Naive Bayes) is proposed. The experimental results show that, from the point of view of accuracy and recall, the classification effect of the combined algorithm proposed in this paper is the best among the three methods. The best classification results of accuracy, recall, and F value are 0.87, 0.9, and 0.89, respectively. Experimental results show that this method can quickly, accurately, and persistently obtain high-quality bilingual mixed web pages.

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Link

http://downloads.hindawi.com/journals/cin/2022/9773452.pdf

Reference22 articles.

1. Text categorization based on regularization extreme learning machine

2. POCASUM: policy categorizer and summarizer based on text mining and machine learning

3. A Short Text Classification Method Based on N ‐Gram and CNN

4. Automatic depression classification based on affective read sentences: Opportunities for text-dependent analysis

5. Knowledge transfer based on feature representation mapping for text classification

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An extended TF-IDF method for improving keyword extraction in traditional corpus-based research: An example of a climate change corpus;Data & Knowledge Engineering;2024-09

2. Collection and Automatic Analysis with Natural Language Processing on a Corpus of Andean Oral Literature Implemented on the Web;Lecture Notes in Networks and Systems;2024

3. Numerical Analysis and Optimization of English Reading Corpus for Feature Extraction;Wireless Communications and Mobile Computing;2022-09-06