Species Detection and Segmentation of Multi-specimen Historical Herbaria-Reference-Cited by-同舟云学术

Species Detection and Segmentation of Multi-specimen Historical Herbaria

Published:2021-09-07 Issue: Volume:5 Page:
ISSN:2535-0897
Container-title:Biodiversity Information Science and Standards
language:
Short-container-title:BISS

Author:

Thirukokaranam Chandrasekar Krishna Kumar,Milleville Kenzo,Verstockt Steven

Abstract

Historically, herbarium specimens have provided users with documented occurrences of plants in specific locations over time. Herbarium collections have therefore been the basis of systematic botany for centuries (Younis et al. 2020). According to the latest summary report based on the data from Index Herbariorum, there are around 3400 active herbaria in the world containing 397 million specimens that are spread across 182 countries (Thiers 2021). Exponential growth in high quality image capturing devices induced by the enormous amount of uncovered collections has further led to rising interest in large scale digitization initiatives across the world (Le Bras et al. 2017). As herbarium specimens are increasingly becoming digitised and accessible in online repositories, an important need has also emerged to develop automated tools to process and enrich these collections to facilitate better access to the preserved archives. This rising number of digitised herbarium sheets provides an opportunity to employ computer-based image processing techniques, such as deep learning, to automatically identify species and higher taxa (Carranza-Rojas and Joly 2018, Carranza-Rojas et al. 2017, Younis et al. 2020) or to extract other useful information from the herbaria sheets, such as detecting handwritten text, color bars, scales and barcodes. The species identification task works well for herbarium sheets that have only one species in a page. However, there are many herbarium books that have multiple species on the same page (as shown in Fig. 1) for which the complexity of the identification problem increases tremendously. It also involves a great deal of time and effort if they are to be enriched manually. In this work, we propose a pipeline that can automatically detect, identify, and enrich plant species in multi-specimen herbaria. The core idea of the pipeline is to detect unique plant species and handwritten text around the plant species and map the text to the correct plant species. As shown in Fig. 2, the proposed pipeline begins with the pre-processing of the images. The images are rotated and aligned such that the longest edge is maintained as its height. In the case of herbarium books, the pages are detected and morphological transformations are performed to reduce occlusions (Thirukokaranam Chandrasekar and Verstockt 2020). A YOLOv3 (You Only Look Once version 3) object detection model (Zhao and Li 2020) is trained from scratch to detect plants and text. The model was trained on a dataset of single species herbarium sheets with a mosaic augmentation technique to extend the plants model to detect multiple species. The first results of the training shows impressive results although it could be further improved with more labelled data. We also plan to train an object segmentation model and contrast its performance with the plant detection model for multi-specimen herbarium sheets. After detecting both the plants and the text, the text will be recognized with a state-of-the-art handwritten text recognition (HTR) model. The recognized text can then be matched with a database of specimens, to identify each detected specimen. Furthermore, additional textual metadata (e.g. date, locality, collector's name, institution) visible on the sheet will be recognized and used to enrich the collection.

Publisher

Pensoft Publishers

Link

https://biss.pensoft.net/article/74060/download/pdf/

Reference7 articles.

1. Going deeper in the automated identification of Herbarium specimens;Carranza-Rojas;BMC Evolutionary Biology,2017

2. The French Muséum National d’Histoire Naturelle vascular plant herbarium collection dataset;Le Bras;Scientific Data,2017

3. Page Boundary Extraction of Bound Historical Herbaria

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DiSSCo Flanders: A regional natural science collections management infrastructure in an international context;Biodiversity Information Science and Standards;2022-09-07

2. Applications of computer vision and machine learning techniques for digitized herbarium specimens: A systematic literature review;Ecological Informatics;2022-07