Enhancing Chinese Address Parsing in Low-Resource Scenarios through In-Context Learning-Reference-Cited by-同舟云学术

Enhancing Chinese Address Parsing in Low-Resource Scenarios through In-Context Learning

Published:2023-07-22 Issue:7 Volume:12 Page:296
ISSN:2220-9964
Container-title:ISPRS International Journal of Geo-Information
language:en
Short-container-title:IJGI

Author:

Ling Guangming¹^ORCID,Mu Xiaofeng¹²,Wang Chao³^ORCID,Xu Aiping¹

Affiliation:

1. School of Computer Science, Wuhan University, Wuhan 430072, China

2. Wuhan Children’s Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science & Technology, Wuhan 430074, China

3. The State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430072, China

Abstract

Address parsing is a crucial task in natural language processing, particularly for Chinese addresses. The complex structure and semantic features of Chinese addresses present challenges due to their inherent ambiguity. Additionally, different task scenarios require varying levels of granularity in address components, further complicating the parsing process. To address these challenges and adapt to low-resource environments, we propose CapICL, a novel Chinese address parsing model based on the In-Context Learning (ICL) framework. CapICL leverages a sequence generator, regular expression matching, BERT semantic similarity computation, and Generative Pre-trained Transformer (GPT) modeling to enhance parsing accuracy by incorporating contextual information. We construct the sequence generator using a small annotated dataset, capturing distribution patterns and boundary features of address types to model address structure and semantics, which mitigates interference from unnecessary variations. We introduce the REB–KNN algorithm, which selects similar samples for ICL-based parsing using regular expression matching and BERT semantic similarity computation. The selected samples, raw text, and explanatory text are combined to form prompts and inputted into the GPT model for prediction and address parsing. Experimental results demonstrate significant achievements of CapICL in low-resource environments, reducing dependency on annotated data and computational resources. Our model’s effectiveness, adaptability, and broad application potential are validated, showcasing its positive impact in natural language processing and geographical information systems.

Funder

National Key R&D Program of China

Key R&D Program of Hubei Province

National Natural Science Foundation of China program

Open Fund of National Engineering Research Centre for Geographic Information System

Publisher

MDPI AG

Subject

Earth and Planetary Sciences (miscellaneous),Computers in Earth Sciences,Geography, Planning and Development

Link

https://www.mdpi.com/2220-9964/12/7/296/pdf

Reference36 articles.

1. NeuroTPR: A Neuro-net Toponym Recognition Model for Extracting Locations from Social Media Messages;Wang;Trans. GIS,2020

2. Tao, L., Xie, Z., Xu, D., Ma, K., Qiu, Q., Pan, S., and Huang, B. (2022). Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model. ISPRS Int. J. Geo-Inf., 11.

3. Context-Aware Automated Interpretation of Elaborate Natural Language Descriptions of Location through Learning from Empirical Data;Stock;Int. J. Geogr. Inf. Sci.,2018

4. Transformer Based Named Entity Recognition for Place Name Extraction from Unstructured Text;Berragan;Int. J. Geogr. Inf. Sci.,2023

5. Li, H., Lu, W., Xie, P., and Li, L. (2019, January 2–7). Neural Chinese Address Parsing. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GPT, large language models (LLMs) and generative artificial intelligence (GAI) models in geospatial science: a systematic review;International Journal of Digital Earth;2024-05-20

2. Non-Standard Address Parsing in Chinese Based on Integrated CHTopoNER Model and Dynamic Finite State Machine;Applied Sciences;2023-08-31