Affiliation:
1. School of Foreign Languages and Literature, Tianjin University, Tianjin 300350, China
Abstract
In order to accurately extract the useful information in English, this paper studies English text analysis combined with a genetic algorithm and establishes a text analysis system. In this method, a text tendency analysis algorithm based on a genetic algorithm language model is proposed, and a Doc2vec text feature representation algorithm integrating the LDA model is designed; the parallelization technology of text analysis algorithm is studied, and the parallelization model of the algorithm by using spark big data platform is designed; the process of English text tendency analysis is studied, and a Chinese text analysis system is designed and implemented based on big data platform, including corpus intake, corpus annotation, corpus storage, model training, model verification, and other modules. In order to verify the feasibility of this subject, the accuracy of the Doc2vec text feature representation algorithm of the fused LDA model designed in the prototype system is tested. The experimental results show that the fused text representation model has high recognition degree, and the AUC value of the ROC curve reaches 0.95. At the same time, this paper tests the text analysis-related algorithms involved in the system. The test results show that the parallel algorithm can greatly improve the efficiency of the system.
Subject
Computer Networks and Communications,Computer Science Applications
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献