METAPLANTCODE: Harmonizing Plant Metabarcoding Pipelines in Europe

Author:

Gardette Auguste,Sklab Youcef,Belda EugeniORCID,Chenin Eric,Zucker Jean-Daniel

Abstract

The METAPLANTCODE project is dedicated to advancing and optimizing pan-European case studies on metabarcoding. The project's objectives include providing best practice recommendations, optimizing analysis pipelines for species identification, and creating user-friendly reference databases. To accomplish these objectives, METAPLANTCODE will identify and address gaps in current methodologies, publish best practice documents on FAIR (Findable, Accessible, Interoperable, Reusable) data publishing for plant metabarcode data to GBIF (Global Biodiversity Information Facility) and the INSDC (International Nucleotide Sequence Database Collaboration), and implement ELIXIR-compatible multimodal deep learning (DL) models in novel tools for standalone metabarcoding analyses using various data sources. A significant focus of the project is enhancing species identification accuracy through GBIF records and metadata. This involves mapping regional, national, and international botanical taxonomic checklists, red lists, and floras to the Catalogue of Life (COL) via the COL ChecklistBank. Additionally, taxonomic and floristic literature will be semantically enriched with new entity recognition and relationship extraction modules, supporting the enhanced identification of species through domain-specific descriptive and phenotypic features. An interface will link taxonomic names to treatments, identify homonyms and synonyms, and facilitate the conversion and annotation of floras, red lists, and ecological treatments. All METAPLANTCODE products will adhere to FAIR standards by the project's end. The project emphasizes knowledge transfer from the outset, engaging with associated partners and stakeholders. Key stakeholders will be identified, priorities set, and communication channels established, monitored, and adjusted as necessary. Efforts to enhance stakeholder engagement, training, and outreach will ensure that plant metabarcoding becomes a routine standard for biodiversity monitoring in Europe and beyond. Deep Learning for Plant Metabarcoding Within the METAPLANTCODE project, our team is tasked with improving taxonomic precision by integrating deep learning on metabarcoding data and metadata. Previous studies have demonstrated the applicability of deep learning to non-plant barcoding data and its computational efficiency compared to traditional bioinformatics approaches (Flück et al. 2022). Deep Learning Models for Metabarcoding Data Our approach involves evaluating the efficacy of several deep learning models—such as Convolutional Neural Networks (CNN)(LeCun et al. 2015), Transformer models (Vaswani et al. 2017), Hyena (Poli et al. 2023), and Mamba architectures (Gu et al. 2023)—on plant barcoding datasets. Preliminary results will be presented, highlighting the application of these models and the proposed ensemble method (Mohammed and Kora 2023), which combines multiple barcode sequence representations and learning strategies. The ensemble approach, when integrated with classical machine learning models such as logistic regression and Support Vector Machines (SVM) (Noble 2006), is anticipated to offer improved precision and robustness compared to individual model applications (Fig. 1). Multimodal Refinement of Predictions In the subsequent phase, we aim to refine genetic sequence classifications by employing a multimodal strategy. This approach will integrate genetic information with traditional botanical knowledge. We will utilize biological interaction lists (e.g., species-species, species-habitat) provided by the METAPLANTCODE project to train a large language model (LLM) on relevant scientific literature. This LLM, specifically tailored for plant biodiversity, will incorporate metadata associated with genetic samples (including location, temporality, and climatic conditions). By merging embeddings of both metadata and genetic data, we aim to enhance the accuracy of taxonomic predictions (Fig. 2). Conclusion Through this research, we aim to develop an effective method for integrating genetic data with textual information from various sources. We anticipate that this approach will not only enhance plant metabarcoding but also be applicable to other barcoding fields, such as bacteria, fish, fungi, and more. Additionally, we expect this methodology to find broader applications in genomic research, providing valuable insights and improvements across diverse biological disciplines.

Publisher

Pensoft Publishers

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3