ChineseCTRE: A Model for Geographical Named Entity Recognition and Correction Based on Deep Neural Networks and the BERT Model

Author:

Zhang Wei123,Meng Jingtao23,Wan Jianhua1,Zhang Chengkun4ORCID,Zhang Jiajun5,Wang Yuanyuan46,Xu Liuchang578ORCID,Li Fei23

Affiliation:

1. College of Oceanography and Space Informatics, China University of Petroleum, Qingdao 266580, China

2. Land Surveying and Mapping Institute of Shandong Province, Jinan 250102, China

3. Shandong Province Engineering Technology Research Center for Spatial Information and Big Data Applications, Jinan 250102, China

4. School of Earth Sciences, Zhejiang University, Hangzhou 310058, China

5. College of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou 311300, China

6. Ocean Academy, Zhejiang University, Zhoushan 316021, China

7. College of Computer Science and Technology, Zhejiang University, Hangzhou 310063, China

8. Financial Big Data Research Institute, Sunyard Technology Co., Ltd., Hangzhou 310053, China

Abstract

Social media is widely used to share real-time information and report accidents during natural disasters. Named entity recognition (NER) is a fundamental task of geospatial information applications that aims to extract location names from natural language text. As a result, the identification of location names from social media information has gradually become a demand. Named entity correction (NEC), as a complementary task of NER, plays a crucial role in ensuring the accuracy of location names and further improving the accuracy of NER. Despite numerous methods having been adopted for NER, including text statistics-based and deep learning-based methods, there has been limited research on NEC. To address this gap, we propose the CTRE model, which is a geospatial named entity recognition and correction model based on the BERT model framework. Our approach enhances the BERT model by introducing incremental pre-training in the pre-training phase, significantly improving the model’s recognition accuracy. Subsequently, we adopt the pre-training fine-tuning mode of the BERT base model and extend the fine-tuning process, incorporating a neural network framework to construct the geospatial named entity recognition model and geospatial named entity correction model, respectively. The BERT model utilizes data augmentation of VGI (volunteered geographic information) data and social media data for incremental pre-training, leading to an enhancement in the model accuracy from 85% to 87%. The F1 score of the geospatial named entity recognition model reaches an impressive 0.9045, while the precision of the geospatial named entity correction model achieves 0.9765. The experimental results robustly demonstrate the effectiveness of our proposed CTRE model, providing a reference for subsequent research on location names.

Funder

Major Science and Technology Innovation Project of Shandong Province

the National Natural Science Foundation of China

the Natural Science Foundation of Zhejiang Province

Publisher

MDPI AG

Subject

Earth and Planetary Sciences (miscellaneous),Computers in Earth Sciences,Geography, Planning and Development

Reference57 articles.

1. Location based services: Ongoing evolution and research agenda;Huang;J. Locat. Based Serv.,2018

2. Representation and analytical models for location-based big data;Yao;Int. J. Geogr. Inf. Sci.,2019

3. GeoAI: Where machine learning and big data converge in GIScience;Li;J. Spat. Inf. Sci.,2020

4. Mozharova, V.A., and Loukachevitch, N.V. (2016, January 7–9). Combining knowledge and CRF-based approach to named entity recognition in Russian. Proceedings of the 5th International Conference on Analysis of Images, Social Networks and Texts, AIST 2016, Yekaterinburg, Russia. Revised Selected Papers 5.

5. Yin, W., Kann, K., Yu, M., and Schütze, H. (2017). Comparative study of CNN and RNN for natural language processing. arXiv.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3