Incentivising use of structured language in biological descriptions: Author-driven phenotype data and ontology production-Reference-Cited by-同舟云学术

Incentivising use of structured language in biological descriptions: Author-driven phenotype data and ontology production

Published:2018-11-07 Issue: Volume:6 Page:
ISSN:1314-2828
Container-title:Biodiversity Data Journal
language:
Short-container-title:BDJ

Author:

Cui Hong,Macklin James^ORCID,Sachs Joel,Reznicek Anton,Starr Julian,Ford Bruce,Penev Lyubomir^ORCID,Chen Hsin-Liang

Abstract

Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production.

Publisher

Pensoft Publishers

Subject

Ecology,Ecology, Evolution, Behavior and Systematics

Link

https://bdj.pensoft.net/lib/ajax_srv/article_elements_srv.php?action=download_pdf&item_id=29616

Reference39 articles.

1. The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite

2. Socialization tactics in Wikipedia and their effects;Choi;Proceedings of the ACM Conference on Computer Supported Cooperative Work,2010

3. Implementation of data citations and persistent identifiers at the ORNL DAAC

4. CharaPaser+EQ: Performance Evaluation Without Gold Standard;Cui;Proceedings of the 78th Annual Meeting of Association for Information Science and Technology,2015

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From Noisy Data to Useful Color Palettes: One Step in Making Biodiversity Data FAIR;Lecture Notes in Computer Science;2023

2. Authors’ attitude toward adopting a new workflow to improve the computability of phenotype publications;Database;2022-01-01

3. Which methods are the most effective in enabling novice users to participate in ontology creation? A usability study;Database;2021-06-01

4. Measurement Recorder: developing a useful tool for making species descriptions that produces computable phenotypes;Database;2020-01-01

5. Enabling Authors to Produce Computable Phenotype Measurements: Usability Studies on the Measurement Recorder;Communications in Computer and Information Science;2020