Traits for Efficient Navigation and Search in Natural History Collections

Author:

Saliba ElieORCID,Chenin Eric,Vignes Lebbe RégineORCID

Abstract

The application of AI methods is increasingly fueling research in biodiversity. One of the objectives of the French national e-COL+ project is to enable collections to benefit from the innovative contributions of image recognition and text mining. The preceding e-ReColNat project aimed to centralize all the images and data from natural history collections on a single platform (Pérez and Pignal 2013). Despite this abundance of collection-related visual media, the options available for exploring them are currently limited to the usual metadata, such as the name of the species, or the place and date of collection. AI methods offer the promise of better usability (see Ariouat et al. 2023) by extracting characteristics linked to specimens and taxa, known as traits. To go further, it is essential to identify some potential traits that AI models can be trained to recognize. To this end, scientists and curators with expertise in different taxa and conservation techniques were consulted. The taxonomic knowledge of the interviewees covers botany, zoology and paleontology. Their expertise encompasses different types of collections, such as fossils, thin sections, herbarium sheets, alcohol-preserved and dry specimens (Table 1). Some of the traits mentioned are specific to individual specimens, including visible polymorphic morpho-anatomical characteristics, such as the shape of a leaf. Another possible category of traits is related to the specific preservation state of the specimen, such as early traces of pyrite rot (see Larkin 2011) in fossils. The last main category of traits at the specimen level focuses on the presence or absence of elements or organs such as traces of soil, flowers or seeds on a plant, as a way to filter relevant specimens for given studies. These traits can be efficiently extracted using computer vision models, which are trained using corpora assembled by experts. Other traits can be deduced from species-level descriptions. These include broader characteristics than those mentioned above, such as invisible morpho-anatomy at the level of the specimen, such as the potential size of a tree. The ecology, phenology, spatial distribution and relationships with humans were also cited. Natural language processing (NLP) artificial intelligence techniques are used to extract these traits (Sahraoui et al. 2022). There is a synergy between the two AI approaches: taxon-level traits identified through text mining can also be used to train computer vision models, improving their ability to recognize these traits in images. This link between traits and species makes it possible to automatically annotate corpora on a large scale. The main issue that emerged during the interviews was the vocabulary. As an example, the notions of ‘toothed’ or ‘denticulate’ to describe a leaf margin are difficult to strictly differentiate. Moreover, some collections at the Muséum national d'Histoire naturelle (MNHN) need an upstream improvement of their current metadata (missing or weak taxonomic identification, database populating in progress), before AI-derived data can be implemented effectively. In conclusion, by systematically identifying and extracting traits relevant to navigation and search from a vast array of images, the e-Col+ project enhances the usability of French collections. Collaboration between scientists, curators and AI experts ensures the robustness and usefulness of the project's outcomes, paving the way for innovative research and application.

Publisher

Pensoft Publishers

Reference4 articles.

1. Extracting Masks from Herbarium Specimen Images Based on Object Detection and Image Segmentation Techniques

2. Pyrite Decay: cause and effect, prevention and cure;Larkin;NatSCA News,2011

3. Numériser et promouvoir les collections d’histoire naturelle ;Pérez;Bulletin des bibliothèques de France (BBF),2013

4. NEARSIDE: Structured kNowledge Extraction frAmework from SpecIes DEscriptions

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3