Abstract
It is common for people working on linguistic geography, language contact and typology to make use of some type of distance metric between lects. However, most work so far has either used Euclidean distances, or geodesic distance, both of which do not represent the real separation between communities very accurately. This paper presents two datasets: one on walking distances and one on topographic distances between over 8700 lects across all macro-areas. We calculated walking distances using Open Street Maps data, and topographic distances using digital elevation data. We evaluate these distance metrics on three case studies and show that from the four distances, the topographic and geodesic distances showed the most consistent performance across datasets, and would be likely to be reasonable first choices. At the same time, in most cases, the Euclidean distances were not much worse than the other distances, and might be a good enough approximation in cases for which performance is critical, or the dataset cover very large areas, and the point-location information is not very precise.
Funder
Horizon Europe Framework Programme
Deutsche Forschungsgemeinschaft
Reference25 articles.
1. Diachronic Atlas of Comparative Linguistics (DiACL)—a database for ancient language typology.;G Carling;PLoS One.,2018
2. Stan: a probabilistic programming language.;B Carpenter;J Stat Softw.,2017
3. Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010).;J Danielson,2011
4. Automatic model construction with Gaussian processes.;D Duvenaud,2014
5. The Effects of DEM Generalization Methods on Derived Hydrologic Features.;D Gesch;Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources.,1999
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Sprachgeschichte(n) und die Rolle der Grenzen;Jahrbuch für Germanistische Sprachgeschichte;2024-07-31