Author:
DeLozier Grant,Baldridge Jason,London Loretta
Abstract
Toponym resolution, or grounding names of places to their actual locations, is an important problem in analysis of both historical corpora and present-day news and web content. Recent approaches have shifted from rule-based spatial minimization methods to machine learned classifiers that use features of the text surrounding a toponym. Such methods have been shown to be highly effective, but they crucially rely on gazetteers and are unable to handle unknown place names or locations. We address this limitation by modeling the geographic distributions of words over the earth's surface: we calculate the geographic profile of each word based on local spatial statistics over a set of geo-referenced language models. These geo-profiles can be further refined by combining in-domain data with background statistics from Wikipedia. Our resolver computes the overlap of all geo-profiles in a given text span; without using a gazetteer, it performs on par with existing classifiers. When combined with a gazetteer, it achieves state-of-the-art performance for two standard toponym resolution corpora (TR-CoNLL and Civil War). Furthermore, it dramatically improves recall when toponyms are identified by named entity recognizers, which often (correctly) find non-standard variants of toponyms.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献