Abstract
This article describes a novel approach for toponym resolution with deep neural networks. The proposed approach does not involve matching references in the text against entries in a gazetteer, instead directly predicting geo-spatial coordinates. Multiple inputs are considered in the neural network architecture (e.g., the surrounding words are considered in combination with the toponym to disambiguate), using pre-trained contextual word embeddings (i.e., ELMo or BERT) as well as bi-directional Long Short-Term Memory units, which are both regularly used for modeling textual data. The intermediate representations are then used to predict a probability distribution over possible geo-spatial regions, and finally to predict the coordinates for the input toponym. The proposed model was tested on three datasets used on previous toponym resolution studies, specifically the (i) War of the Rebellion, (ii) Local–Global Lexicon, and (iii) SpatialML corpora. Moreover, we evaluated the effect of using (i) geophysical terrain properties as external information, including information on elevation or terrain development, among others, and (ii) additional data collected from Wikipedia articles, to further help with the training of the model. The obtained results show improvements using the proposed method, when compared to previous approaches, and specifically when BERT embeddings and additional data are involved.
Funder
European Commission
Fundação para a Ciência e Tecnologia
Subject
Earth and Planetary Sciences (miscellaneous),Computers in Earth Sciences,Geography, Planning and Development
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献