Semantic-Based Classification of Long Texts on Higher Education in China

Author:

Li Chun1ORCID,Fei Yanying2

Affiliation:

1. School of Marxism, Dalian University of Technology, Dalian 116023, China

2. Faculty of Humanities and Social Sciences, Dalian University of Technology, Dalian 116086, China

Abstract

The development level of higher education (HE) is an important indicator of the development level and development potential of a country. The HE-related document is the mirror to reflect the develop process of the HE. The research of high education (HE) has been developing rapidly in China, resulting in a huge number of texts, such as relevant policies, speech drafts, and yearbooks. The traditional manual classification of HE texts is inefficient and unable to deal with the huge number of HE texts. Besides, the effect of direct classification is rather poor because HE texts tend to be long and exist as an imbalanced dataset. To solve these problems, this paper improves the convolutional neural network (CNN) into the HE-CNN classification model for HE texts. Firstly, Chinese HE policies, speech drafts, and yearbooks (1979–2020) were downloaded from the official website of Chinese Ministry of Education. In total, 463 files were collected and divided into four classes, namely, definition, task, method, and effect evaluation. To handle the huge number of HE texts, the Twitter-latent Dirichlet allocation (LDA) topic model was employed to extract word frequency and critical information, such as age and author, enhancing the training effect of CNN. To address the dataset imbalance problem, CNN parameters were optimized repeatedly through comparative experiments, which further improve the training effect. Finally, the proposed HE-CNN model was found more effective and accurate than other classification models.

Funder

Key Project of Liaoning Provincial Law Society

Publisher

Hindawi Limited

Subject

Modeling and Simulation

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3