Enabling Community Curation of Biological Source Annotations of Molecular Data Through PlutoF and the ELIXIR Contextual Data Clearinghouse

Author:

Balavenkataraman Kadhirvelu Vishnukumar,Abarenkov KessyORCID,Zirk Allan,Paupério JoanaORCID,Cochrane Guy,Jayathilaka Suran,Bánki Olaf,Lanfear JerryORCID,Ivanov Filipp,Piirmann Timo,Pöhönen Raivo,Kõljalg Urmas

Abstract

The advancements in sequencing technologies have greatly contributed to the documentation of Earth’s biodiversity. However, for exploring the full potential of molecular resources for biodiversity, there needs to be a good linkage between sequence data and its biological source, contributing to a network of connected data in the biodiversity research cycle. This requires a foundation of well-structured and accessible annotations in the molecular sequence repositories. The International Nucleotide Sequence Database Collaboration (INSDC), of which the European Nucleotide Archive (ENA) is its European node, holds a large amount of annotations associated with sequence data, relating to its biological source (e.g., specimens in natural history collections). However, for a number of records, these annotations may be incomplete (e.g., missing voucher information), ambiguous or even inaccurate. Therefore, we have implemented a workflow that allows third-party annotations to be attached to sequence and sample records using two existing services, the PlutoF platform and the ELIXIR Contextual Data ClearingHouse. This work was developed within the scope of the BiCIKL (Biodiversity Community Integrated Knowledge Library) project, which aims to establish open science practices in the biodiversity domain. PlutoF is an online data management platform that also provides computing services for biology-related research. PlutoF features allow registered users to enter their own data and access public data at INSDC. Users can enter and manage a range of data, as taxonomic classifications, occurrences, etc. This platform also includes a module that allows the addition of third-party annotations (on material source, taxonomic identification, etc.) linked to specimens or sequence records. This module was already in use by the UNITE community for annotation of INSDC rDNA Internal Transcribed Spacer sequence datasets (Abarenkov et al. 2021). These UNITE annotations are displayed in the National Centre for Biotechnology Information (NCBI) records through links to the PlutoF platform. However, there was the need for an automated solution that allowed third-party annotations to any sequence or sample record at INSDC. This was implemented through the operation of the ELIXIR Contextual Data ClearingHouse (hereafter as Clearinghouse). The Clearinghouse holds a simple RESTful Application Programming Interface (API) to support the submission of additions and improvements to current metadata attributes, such as information on material sources, on records publicly available in the ELIXIR data resources. The Clearinghouse enables the submission of these corrected metadata from databases (such as the PlutoF platform) to the primary data repositories. The workflow developed is shown in Fig. 1 and consists of the following steps: i) users annotate sequence metadata that is regularly downloaded from INSDC using NCBI’s E-utilities; ii) an annotation proposal is created and a verification notification is sent to an assigned reviewer; iii) the reviewer evaluates the annotation proposal and accepts it or rejects it with comments; iv) if the annotation proposal is accepted, the annotated fields that may be mapped to ENA fields are then pushed to the Clearinghouse using their RESTful API. The annotations when received at ENA are then reviewed before being displayed. This workflow is implemented through a web interface in PlutoF, which allows user-friendly and effortless reporting of corrections or additions to biological source metadata in sequence records. Overall, we expect this tool to contribute to the enrichment of metadata associated with sequence records, and therefore increase the links between the molecular and biodiversity resources, and enable sequencing data to deliver their full potential for biodiversity conservation.

Publisher

Pensoft Publishers

Subject

General Engineering

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Linking Between Molecular and Biodiversity Data: A BiCIKL Perspective;Biodiversity Information Science and Standards;2024-08-28

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3