Affiliation:
1. Chinese Academy of Sciences, Northwest Institute of Eco-Environment and Resources, Lanzhou 730000, China
Abstract
It is important to classify academic papers in a fine-grained manner to uncover deeper implicit themes and semantics in papers for better semantic retrieval, paper recommendation, research trend prediction, topic analysis, and a series of other functions. Based on the ontology of the climate change domain, this study used an unsupervised approach to combine two methods, syntactic structure and semantic modeling, to build a framework of subject-indexing techniques for academic papers in the climate change domain. The framework automatically indexes a set of conceptual terms as research topics from the domain ontology by inputting the titles, abstracts and keywords of the papers using natural language processing techniques such as syntactic dependencies, text similarity calculation, pre-trained language models, semantic similarity calculation, and weighting factors such as word frequency statistics and graph path calculation. Finally, we evaluated the proposed method using the gold standard of manually annotated articles and demonstrated significant improvements over the other five alternative methods in terms of precision, recall and F1-score. Overall, the method proposed in this study is able to identify the research topics of academic papers more accurately, and also provides useful references for the application of domain ontologies and unsupervised data annotation.
Funder
Youth Project of Gansu Provincial Social Science Planning
General Project of Gansu Provincial Social Science Planning
Subject
Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction
Reference47 articles.
1. Identification of research hypotheses and new knowledge from scientific literature;Shardlow;BMC Med. Inform. Decis. Mak.,2018
2. Bibliometric-enhanced information retrieval: A novel deep feature engineering approach for algorithm searching from full-text publications;Safder;Scientometrics,2019
3. Golub, K. (2022, December 28). Automatic Subject Indexing of Text. Available online: https://www.isko.org/cyclo/automatic.
4. Kratt: Developing an Automatic Subject Indexing Tool for the National Library of Estonia;Asula;Cat. Classif. Q.,2021
5. Jordan, M.I. Latent dirichlet allocation;Blei;J. Mach. Learn. Res.,2003