Affiliation:
1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
2. College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
3. China-Pakistan Joint Research Centre on Earth Sciences, Islamabad 45320, Pakistan
4. Jiangsu Centre for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
Abstract
Flood control is a global problem; increasing number of flooding disasters occur annually induced by global climate change and extreme weather events. Flood studies are important knowledge sources for flood risk reduction and have been recorded in the academic literature. The main objective of this paper was to acquire flood control knowledge from long-tail data of the literature by using deep learning techniques. Screening was conducted to obtain 4742 flood-related academic documents from past two decades. Machine learning was conducted to parse the documents, and 347 sample data points from different years were collected for sentence segmentation (approximately 61,000 sentences) and manual annotation. Traditional machine learning (NB, LR, SVM, and RF) and artificial neural network-based deep learning algorithms (Bert, Bert-CNN, Bert-RNN, and ERNIE) were implemented for model training, and complete sentence-level knowledge extraction was conducted in batches. The results revealed that artificial neural network-based deep learning methods exhibit better performance than traditional machine learning methods in terms of accuracy, but their training time is much longer. Based on comprehensive feature extraction capability and computational efficiency, the performances of deep learning methods were ranked as: ERNIE > Bert-CNN > Bert > Bert-RNN. When using Bert as the benchmark model, several deformation models showed applicable characteristics. Bert, Bert-CNN, and Bert-RNN were good at acquiring global features, local features, and processing variable-length inputs, respectively. ERNIE showed improved masking mechanism and corpus and therefore exhibited better performance. Finally, 124,196 usage method and 8935 quotation method sentences were obtained in batches. The proportions of method sentence in the literature showed increasing trends over the last 20 years. Thus, as literature with more method sentences accumulates, this study lays a foundation for knowledge extraction in the future.
Funder
National Natural Science Foundation of China
National Key R&D Program of China
Chinese Academy of Sciences Project
Construction Project of the China Knowledge Center for Engineering Sciences and Technology
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference50 articles.
1. Zhang, M., and Wang, J. (2022). Global flood disaster research graph analysis based on literature mining. Appl. Sci., 12.
2. Li, Y. (2021). Construction and Application of Natural Disaster Emergency Knowledge Graph-Taking Flood Disaster as an Example. [Ph.D. Thesis, Wuhan University].
3. A comparative study of term extraction schemes in academic literature;Jiang;J. Inf. Resour. Manag.,2021
4. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references;Bornmann;J. Assoc. Inf. Sci. Technol.,2015
5. An extraction method for papers via integration of rules with SVM;Li;Comput. Technol. Dev.,2017
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献