Experience: Automated Prediction of Experimental Metadata from Scientific Publications-Reference-Cited by-同舟云学术

Experience: Automated Prediction of Experimental Metadata from Scientific Publications

Published:2021-12-31 Issue:4 Volume:13 Page:1-11
ISSN:1936-1955
Container-title:Journal of Data and Information Quality
language:en
Short-container-title:J. Data and Information Quality

Author:

Nayak Stuti¹,Zaveri Amrapali¹,Serrano Pedro Hernandez¹,Dumontier Michel¹

Affiliation:

1. Institute of Data Science, Maastricht University, Maastricht, The Netherlands

Abstract

While there exists an abundance of open biomedical data, the lack of high-quality metadata makes it challenging for others to find relevant datasets and to reuse them for another purpose. In particular, metadata are useful to understand the nature and provenance of the data. A common approach to improving the quality of metadata relies on expensive human curation, which itself is time-consuming and also prone to error. Towards improving the quality of metadata, we use scientific publications to automatically predict metadata key:value pairs. For prediction, we use a Convolutional Neural Network (CNN) and a Bidirectional Long-short term memory network (BiLSTM). We focus our attention on the NCBI Disease Corpus, which is used for training the CNN and BiLSTM. We perform two different kinds of experiments with these two architectures: (1) we predict the disease names by using their unique ID in the MeSH ontology and (2) we use the tree structures of MeSH ontology to move up in the hierarchy of these disease terms, which reduces the number of labels. We also perform various multi-label classification techniques for the above-mentioned experiments. We find that in both cases CNN achieves the best results in predicting the superclasses for disease with an accuracy of 83%.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3451219

Reference20 articles.

1. NCBI GEO: archive for functional genomics data sets—update

2. A neural probabilistic language model;Bengio Yoshua;Journal of Machine Learning Research 3,2003

3. The conundrum of sharing research data

4. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data

5. François Chollet et al. 2015. Keras. https://keras.io. François Chollet et al. 2015. Keras. https://keras.io.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. (Semi-) Automatic Construction of Knowledge Graph Metadata;The Semantic Web: ESWC 2022 Satellite Events;2022