Abstract
Global digitization efforts have archived millions of specimen scans worldwide in herbarium collections, which are essential for studying plant evolution and biodiversity. ReColNat hosts, at present, over 10 million images. However, analyzing these datasets poses crucial challenges for botanical research. The application of deep learning in biodiversity analyses, particularly in analyzing herbarium scans, has shown promising results across numerous tasks (Ariouat et al. 2023, Ariouat et al. 2024, Groom et al. 2023, Sahraoui et al. 2023).
Within the e-Col+project (ANR-21-ESRE-0053), we are developing multiple deep learning models aimed at identifying plant morphological traits. We have developed pipelines and models for cleaning, analyzing, and transforming herbarium images, including models for: i) detecting non-vegetal elements, such as barcodes, envelopes, labels, etc.; ii) detecting plant organs, including leaves, flowers, fruits, etc.; and iii) segmenting to recognize plant parts for image cleaning. We are also developing models for classification tasks related to various morphological traits.
To validate these models, improve their generalization, and make them easily usable by end-users, deploying them within a generic platform is crucial. The generic platform called PlantAI, currently under development by the e-Col+ project, should enable easy deployment during development for testing and allow users to load annotations for new traits in order to train a model and add it to the existing catalog. The platform is based on a microservice architecture, allowing users to upload images, create custom datasets, and access various AI models for image analysis.
The platform is composed of four main modules, as illustrated in Fig. 1. The first module is the collaborative workspace manager, which allows users to create projects and image datasets and invite other users to collaborate on a project. The second module is the navigation interface and dashboards. This module integrates a search engine using metadata and AI annotations, a navigation interface between projects, datasets, and specimens, as well as dashboards for analysis across datasets, specimens, and AI models.
The third module is the dataset manager, which handles metadata and annotations associated with the specimens. These annotations can be produced either by expert users or by AI models. The fourth module is the AI models management module, so that models can be used to generate AI annotations of specimen. During the development lifecycle of an AI model, users can create datasets and annotate them with AI models. These annotations can be in two possible states: validated by experts and non-validated. Users collaborating on a project can indicate errors in the model predictions and leave comments to explain their evaluations. These corrections made by experts can be used to retrain the models and thus improve their performance.
This platform, will be highly beneficial for botanists, enhancing the efficiency and effectiveness of biodiversity analyses from herbarium scans. We aim to provide users with a catalog of AI models through this platform and allow them to import their own datasets with their own annotations regarding traits of their choice. Users will be able to select a model from the AI model catalog and train it using their dataset. Ultimately, the model obtained from this training will be automatically deployed to be available for AI annotation. The annotations produced by this model will be automatically available in the filtering and navigation interface, thus allowing for dynamic and automatic integration of the AI annotations into the navigation interface.