Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem
-
Published:2022-06-17
Issue:1
Volume:12
Page:
-
ISSN:2045-2322
-
Container-title:Scientific Reports
-
language:en
-
Short-container-title:Sci Rep
Author:
Flück BenjaminORCID, Mathon LaëtitiaORCID, Manel StéphanieORCID, Valentini AliceORCID, Dejean TonyORCID, Albouy Camille, Mouillot David, Thuiller WilfriedORCID, Murienne JérômeORCID, Brosse SébastienORCID, Pellissier LoïcORCID
Abstract
AbstractHigh-throughput DNA sequencing is becoming an increasingly important tool to monitor and better understand biodiversity responses to environmental changes in a standardized and reproducible way. Environmental DNA (eDNA) from organisms can be captured in ecosystem samples and sequenced using metabarcoding, but processing large volumes of eDNA data and annotating sequences to recognized taxa remains computationally expensive. Speed and accuracy are two major bottlenecks in this critical step. Here, we evaluated the ability of convolutional neural networks (CNNs) to process short eDNA sequences and associate them with taxonomic labels. Using a unique eDNA data set collected in highly diverse Tropical South America, we compared the speed and accuracy of CNNs with that of a well-known bioinformatic pipeline (OBITools) in processing a small region (60 bp) of the 12S ribosomal DNA targeting freshwater fishes. We found that the taxonomic labels from the CNNs were comparable to those from OBITools, with high correlation levels for the composition of the regional fish fauna. The CNNs enabled the processing of raw fastq files at a rate of approximately 1 million sequences per minute, which was about 150 times faster than with OBITools. Given the good performance of CNNs in the highly diverse ecosystem considered here, the development of more elaborate CNNs promises fast deployment for future biodiversity inventories using eDNA.
Publisher
Springer Science and Business Media LLC
Subject
Multidisciplinary
Reference78 articles.
1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M., Ghemawat, S. & Zheng, X. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. (2015). 2. Alberdi, A., Aizpurua, O., Gilbert, M. T. P. & Bohmann, K. Scrutinizing key steps for reliable metabarcoding of environmental samples. Methods Ecol. Evol. 9, 134–147 (2018). 3. Albert, J. S. & Reis, R. E. One. Introduction to Neotropical freshwaters. In Historical biogeography of Neotropical freshwater fishes (pp. 3-20). University of California Press. (2011). 4. Allard, L., Popée, M., Vigouroux, R. & Brosse, S. Effect of reduced impact logging and small-scale mining disturbances on Neotropical stream fish assemblages. Aquat. Sci. 78, 315–325 (2016). 5. Berry, O. et al. Making environmental DNA (eDNA) biodiversity records globally accessible. Environ. DNA 3(4), 699–705 (2020).
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|