Abstract
AbstractMetadata are key descriptors of research data, particularly for researchers seeking to apply machine learning (ML) to the vast collections of digitized specimens. Unfortunately, the available metadata is often sparse and, at times, erroneous. Additionally, it is prohibitively expensive to address these limitations through traditional, manual means. This paper reports on research that applies machine-driven approaches to analyzing digitized fish images and extracting various important features from them. The digitized fish specimens are being analyzed as part of the Biology Guided Neural Networks (BGNN) initiative, which is developing a novel class of artificial neural networks using phylogenies and anatomy ontologies. Automatically generated metadata is crucial for identifying the high-quality images needed for the neural network’s predictive analytics. Methods that combine ML and image informatics techniques allow us to rapidly enrich the existing metadata associated with the 7,244 images from the Illinois Natural History Survey (INHS) used in our study. Results show we can accurately generate many key metadata properties relevant to the BGNN project, as well as general image quality metrics (e.g. brightness and contrast). Results also show that we can accurately generate bounding boxes and segmentation masks for fish, which are needed for subsequent machine learning analyses. The automatic process outperforms humans in terms of time and accuracy, and provides a novel solution for leveraging digitized specimens in ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide.
Publisher
Cold Spring Harbor Laboratory
Reference45 articles.
1. R. S. Beaman and N. Cellinese , “Mass digitization of scientific collections: New opportunities to transform the use of biological specimens and underwrite biodiversity science,” ZooKeys, no. 209, p. 7, 2012.
2. Digitization of Biodiversity Collections Reveals Biggest Data on Biodiversity
3. Darwin Core Maintenance Group, “List of Darwin Core terms,” http://rs.tdwg.org/dwc/doc/list/, 2020.
4. Illinois Natural History Survey, “INHS Fish Collection,” https://fish.inhs.illinois.edu/, 2021.
5. DCMI Usage Board, “DCMI Metadata Terms,” https://www.dublincore.org/specifications/dublin-core/dcmi-terms/, 2020.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献