Non-Standard Address Parsing in Chinese Based on Integrated CHTopoNER Model and Dynamic Finite State Machine

Author:

Zhang Mengwei1,Liu Xingui1,Ma Jingzhen1ORCID,Zhang Zheng1,Qiu Yue1ORCID,Jiang Zhipeng1

Affiliation:

1. Institute of Geospatial Information, Information Engineering University, Zhengzhou 450001, China

Abstract

Information in non-standard address texts in Chinese is usually presented with rough content, complex and diverse presentation forms, and inconsistent hierarchical granularity, causing low accuracy in Chinese address parsing. Therefore, we propose a method for parsing non-standard address text in Chinese that integrates the Chinese Toponym Named Entity Recognition (CHTopoNER) model and a dynamic finite state machine (FSM). First, named entity recognition is performed by the CHTopoNER model. Sets of dynamic FSMs are then constructed based on the address hierarchical characteristics to sort and combine the Chinese address elements, thereby achieving address parsing on the Chinese internet. This method showed excellent accuracy in parsing both standard and non-standard placename addresses. In particular, this method performed better in address parsing for disordered or missing hierarchical elements than traditional methods using an FSM. Specifically, this method achieved accuracies of 96.6% and 96.8% for standard and non-standard placenames, respectively. These accuracies increased by 8.0% and 57.1%, respectively, compared with the integrated CHTopoNER model and traditional FSM, and by 7.4% and 19.8%, respectively, compared with the integrated CHTopoNER model and bidirectional FSM. After analysis, the address-parsing method showed good scalability and adaptability, which could be applied to various types of address-parsing tasks.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference55 articles.

1. Tian, Q., Ren, F., Hu, T., Liu, J., Li, R., and Du, Q. (2021). Using an optimized Chinese address matching method to develop a geocoding service: A case study of Shenzhen, China. ISPRS Int. J. Geo Inf., 5.

2. The Chinese address extraction method based on the address tree model;Kang;J. Surv. Mapp.,2015

3. Automated geocoding of textual documents: A survey of current approaches;Melo;Trans. GIS,2017

4. Spatial pattern analysis of address quality: A study on the impact of rapid urban expansion in China;Lin;Environ. Plan. B Urb. Anal. City Sci.,2021

5. Geoscience keyphrase extraction algorithm using enhanced word embedding;Qiu;Expert Syst. Appl.,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3