Author:
Wang Xiaoxia,Xu Xiaozhong,Zhang Jiarui,Zhu Yue,Fan Yuhang,Xu Pengjing
Abstract
Abstract
The implementation of National Science and Technology Innovation Strategy demands exponential growing in knowledge services on literature information institutions. It is the most important knowledge organization tool for Information Retrieval, which can be widely used for semantic citation, organization and retrieval of literature resources. This study aims to develop an innovative algorithm for constructing subject thesaurus based on massive literature resource data and mining academic neologisms, also the semantic relationship between academic neologisms and subject system. We firstly collect a dataset of literature corpus, corresponding work for data pre-processing carried out. Then using the FastText model to complete academic neologisms mining, we construct an automatic categorization model of academic neologisms based on the Bert and TextCNN algorithm. The algorithm proposed in this study is validated by 8.1 million multi-source and heterogeneous literature data in the field of marine disciplines. The result shows that the algorithm can effectively replace 90% of the manual annotation volume, mine a large number of high-quality marine neologisms and successfully build the marine science knowledge base with a pass rate of 82.6% reviewed by expert, which present high accuracy and certain engineering application prospects.
Subject
General Physics and Astronomy
Reference8 articles.
1. Deep Learning--based Text Classification: A Comprehensive Review;Minaee;ACM Computing Surveys (CSUR),2021
2. Bag of tricks for efficient text classification;Joulin,2016
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献