Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers-Reference-Cited by-同舟云学术

Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers

Published:2023-01-20 Issue:1 Volume:37 Page:55-67
ISSN:0933-1875
Container-title:KI - Künstliche Intelligenz
language:en
Short-container-title:Künstl Intell

Author:

Baer Manuel F.^ORCID,Purves Ross S.

Abstract

AbstractNatural language has proven to be a valuable source of data for various scientific inquiries including landscape perception and preference research. However, large high quality landscape relevant corpora are scare. We here propose and discuss a natural language processing workflow to identify landscape relevant documents in large collections of unstructured text. Using a small curated high quality collection of actively crowdsourced landscape descriptions we identify and extract similar documents from two different corpora (Geograph and WikiHow) using sentence-transformers and cosine similarity scores. We show that 1) sentence-transformers combined with cosine similarity calculations successfully identify similar documents in both Geograph and WikiHow effectively opening the door to the creation of new landscape specific corpora, 2) the proposed sentence-transformer approach outperforms traditional Term Frequency - Inverse Document Frequency based approaches and 3) the identified documents capture similar topics when compared to the original high quality collection. The presented workflow is transferable to various scientific disciplines in need of domain specific natural language corpora as underlying data.

Funder

URPP - Language and Space

University of Zurich

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s13218-022-00793-3.pdf

Reference58 articles.

1. Sit MA, Koylu C, Demir I (2019) Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: a case study of hurricane irma. Int J Digital Earth 12(11):1205–1229. https://doi.org/10.1080/17538947.2018.1563219

2. Zahra K, Imran M, Ostermann FO (2020) Automatic identification of eyewitness messages on twitter during disasters. Inform Process Manage 57(1):1–15. https://doi.org/10.1016/j.ipm.2019.102107

3. Klein AZ, Cai H, Weissenbacher D, Levine LD, Gonzalez-Hernandez G (2020) A natural language processing pipeline to advance the use of twitter data for digital epidemiology of adverse pregnancy outcomes. J Biomed Inform 112:1–9. https://doi.org/10.1016/j.yjbinx.2020.100076

4. Klein AZ, Magge A, O’Connor K, Flores Amaro JI, Weissenbacher D, Gonzalez Hernandez G (2021) Toward using twitter for tracking covid-19: A natural language processing pipeline and exploratory data set. J Med Internet Res 23(1):1–6. https://doi.org/10.2196/25314

5. Koblet O, Purves RS (2020) From online texts to landscape character assessment: collecting and analysing first-person landscape perception computationally. Landsc Urban Plann 197:1–16. https://doi.org/10.1016/j.landurbplan.2020.103757