The systematic assessment of completeness of public metadata accompanying omics studies

Author:

Huang Yu-NingORCID,Rajesh AnushkaORCID,Ayyala RamORCID,Sarkar AdityaORCID,Guo RuiweiORCID,Ling Elizabeth,Nakashidze IrinaORCID,Wong Man Yee,Hu Jieting,Nosov AlexeyORCID,Chang Yutong,Abedalthagafi Malak S.ORCID,Mangul SergheiORCID

Abstract

AbstractThe scientific community has accumulated enormous amounts of genomic data stored in specialized public repositories. Genomic data is easily accessible and available from public genomic repositories allowing the biomedical community to effectively share the omics datasets. However, improperly annotated or incomplete metadata accompanying the raw omics data can negatively impact the utility of shared data for secondary analysis. In this study, we perform a comprehensive analysis under 137 studies over 18,559 samples across six therapeutics fields to assess the completeness of metadata accompanying omics studies in both publication and its related online repositories across and make observations about how the process of data sharing could be made reliable. This analysis involved finding studies based on the six therapeutic fields, that are Alzheimer’s disease, acute myeloid leukemia, cystic fibrosis, cardiovascular diseases, inflammatory bowel disease, sepsis, and tuberculosis. We carefully examined the availability of metadata over nine clinical variables, that included disease condition, age, organism, sex, tissue type, ethnicity, country, mortality, and clinical severity. By comparing the metadata availability in both original publications and online repositories, we observed discrepancies in sharing the metadata. We determine that the overall availability of metadata is 72.8%, where the most complete reported phenotypes are disease condition and organism, and the least is mortality. Additionally, we examined the completeness of metadata reported separately in original publications and online repositories. The completeness of metadata from the original publication across the nine clinical phenotypes is 71.1%. In contrast, the overall completeness of metadata information from the public repositories is 48.6%. Our study is the first one to systematically assess the completeness of metadata accompanying raw data across a large number of studies and phenotypes and opens a crucial discussion about solutions to improve completeness and accessibility of metadata accompanying omics studies.

Publisher

Cold Spring Harbor Laboratory

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3