Towards operational phytoplankton recognition with automated high-throughput imaging, near-real-time data processing, and convolutional neural networks-Reference-Cited by-同舟云学术

Towards operational phytoplankton recognition with automated high-throughput imaging, near-real-time data processing, and convolutional neural networks

Published:2022-09-02 Issue: Volume:9 Page:
ISSN:2296-7745
Container-title:Frontiers in Marine Science
language:
Short-container-title:Front. Mar. Sci.

Author:

Kraft Kaisa,Velhonoja Otso,Eerola Tuomas,Suikkanen Sanna,Tamminen Timo,Haraguchi Lumi,Ylöstalo Pasi,Kielosto Sami,Johansson Milla,Lensu Lasse,Kälviäinen Heikki,Haario Heikki,Seppälä Jukka

Abstract

Plankton communities form the basis of aquatic ecosystems and elucidating their role in increasingly important environmental issues is a persistent research question. Recent technological advances in automated microscopic imaging, together with cloud platforms for high-performance computing, have created possibilities for collecting and processing detailed high-frequency data on planktonic communities, opening new horizons for testing core hypotheses in aquatic ecosystems. Analyzing continuous streams of big data calls for development and deployment of novel computer vision and machine learning systems. The implementation of these analysis systems is not always straightforward with regards to operationality, and issues regarding data flows, computing and data treatment need to be considered. We created a data pipeline for automated near-real-time classification of phytoplankton during remote deployment of imaging flow cytometer (Imaging FlowCytobot, IFCB). Convolutional neural network (CNN) is used to classify continuous imaging data with probability thresholds used to filter out images not belonging to our existing classes. The automated data flow and classification system were used to monitor dominating species of filamentous cyanobacteria on the coast of Finland during summer 2021. We demonstrate that good phytoplankton recognition can be achieved with transfer learning utilizing a relatively shallow, publicly available, pre-trained CNN model and fine-tuning it with community-specific phytoplankton images (overall F1-score of 0.95 for test set of our labeled image data complemented with a 50% unclassifiable image portion). This enables both fast training and low computing resource requirements for model deployment making it easy to modify and applicable in wide range of situations. The system performed well when used to classify a natural phytoplankton community over different seasons (overall F1-score 0.82 for our evaluation data set). Furthermore, we address the key challenges of image classification for varying planktonic communities and analyze the practical implications of confused classes. We published our labeled image data set of Baltic Sea phytoplankton community for the training of image recognition models (~63000 images in 50 classes) to accelerate implementation of imaging systems for other brackish and freshwater communities. Our evaluation data set, 59 fully annotated samples of natural communities throughout an annual cycle, is also available for model testing purposes (~150000 images).

Funder

Academy of Finland

Horizon 2020 Framework Programme

Connecting Europe Facility

Publisher

Frontiers Media SA

Subject

Ocean Engineering,Water Science and Technology,Aquatic Science,Global and Planetary Change,Oceanography

Reference66 articles.

1. Responses of the coastal phytoplankton community to tropical cyclones revealed by high-frequency imaging flow cytometry;Anglès;Limnol. Oceanogr.,2015

2. Influence of coastal upwelling and river discharge on the phytoplankton community composition in the northwestern gulf of Mexico;Anglès;Progr. Oceanogr.,2019

3. Random forests;Breiman;Mach. Learn.,2001