Traits for Efficient Navigation and Search in Natural History Collections-Reference-Cited by-同舟云学术

Traits for Efficient Navigation and Search in Natural History Collections

Published:2024-08-16 Issue: Volume:8 Page:
ISSN:2535-0897
Container-title:Biodiversity Information Science and Standards
language:
Short-container-title:BISS

Author:

Saliba Elie^ORCID,Chenin Eric,Vignes Lebbe Régine^ORCID

Abstract

The application of AI methods is increasingly fueling research in biodiversity. One of the objectives of the French national e-COL+ project is to enable collections to benefit from the innovative contributions of image recognition and text mining. The preceding e-ReColNat project aimed to centralize all the images and data from natural history collections on a single platform (Pérez and Pignal 2013). Despite this abundance of collection-related visual media, the options available for exploring them are currently limited to the usual metadata, such as the name of the species, or the place and date of collection. AI methods offer the promise of better usability (see Ariouat et al. 2023) by extracting characteristics linked to specimens and taxa, known as traits. To go further, it is essential to identify some potential traits that AI models can be trained to recognize. To this end, scientists and curators with expertise in different taxa and conservation techniques were consulted. The taxonomic knowledge of the interviewees covers botany, zoology and paleontology. Their expertise encompasses different types of collections, such as fossils, thin sections, herbarium sheets, alcohol-preserved and dry specimens (Table 1). Some of the traits mentioned are specific to individual specimens, including visible polymorphic morpho-anatomical characteristics, such as the shape of a leaf. Another possible category of traits is related to the specific preservation state of the specimen, such as early traces of pyrite rot (see Larkin 2011) in fossils. The last main category of traits at the specimen level focuses on the presence or absence of elements or organs such as traces of soil, flowers or seeds on a plant, as a way to filter relevant specimens for given studies. These traits can be efficiently extracted using computer vision models, which are trained using corpora assembled by experts. Other traits can be deduced from species-level descriptions. These include broader characteristics than those mentioned above, such as invisible morpho-anatomy at the level of the specimen, such as the potential size of a tree. The ecology, phenology, spatial distribution and relationships with humans were also cited. Natural language processing (NLP) artificial intelligence techniques are used to extract these traits (Sahraoui et al. 2022). There is a synergy between the two AI approaches: taxon-level traits identified through text mining can also be used to train computer vision models, improving their ability to recognize these traits in images. This link between traits and species makes it possible to automatically annotate corpora on a large scale. The main issue that emerged during the interviews was the vocabulary. As an example, the notions of ‘toothed’ or ‘denticulate’ to describe a leaf margin are difficult to strictly differentiate. Moreover, some collections at the Muséum national d'Histoire naturelle (MNHN) need an upstream improvement of their current metadata (missing or weak taxonomic identification, database populating in progress), before AI-derived data can be implemented effectively. In conclusion, by systematically identifying and extracting traits relevant to navigation and search from a vast array of images, the e-Col+ project enhances the usability of French collections. Collaboration between scientists, curators and AI experts ensures the robustness and usefulness of the project's outcomes, paving the way for innovative research and application.

Publisher

Pensoft Publishers

Link

https://biss.pensoft.net/article/134816/download/pdf/

Reference4 articles.

1. Extracting Masks from Herbarium Specimen Images Based on Object Detection and Image Segmentation Techniques

2. Pyrite Decay: cause and effect, prevention and cure;Larkin;NatSCA News,2011

3. Numériser et promouvoir les collections d’histoire naturelle ;Pérez;Bulletin des bibliothèques de France (BBF),2013

4. NEARSIDE: Structured kNowledge Extraction frAmework from SpecIes DEscriptions