Curation of over 10,000 transcriptomic studies to enable data reuse-Reference-Cited by-同舟云学术

Curation of over 10,000 transcriptomic studies to enable data reuse

Published:2020-07-14 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Lim Nathaniel^ORCID,Tesar Stepan^ORCID,Belmadani Manuel^ORCID,Poirier-Morency Guillaume^ORCID,Mancarci Burak Ogan^ORCID,Sicherman Jordan^ORCID,Jacobson Matthew,Leong Justin,Tan Patrick,Pavlidis Paul^ORCID

Abstract

AbstractVast amounts of transcriptomic data reside in public repositories, but effective reuse remains challenging. Issues include unstructured dataset metadata, inconsistent data processing and quality control, and inconsistent probe-gene mappings across microarray technologies. Thus, extensive curation and data reprocessing is necessary prior to any reuse. The Gemma bioinformatics system was created to help address these issues. Gemma consists of a database of curated transcriptomic datasets, analytical software, a web interface, and web services. Here we present an update on Gemma’s holdings, data processing and analysis pipelines, our curation guidelines, and software features. As of June 2020, Gemma contains 10,811 manually curated datasets (primarily human, mouse, and rat), over 395,000 samples and hundreds of curated transcriptomic platforms (both microarray and RNA-sequencing). Dataset topics were represented with 10,215 distinct terms from 12 ontologies, for a total of 54,316 topic annotations (mean topics/dataset = 5.2). While Gemma has broad coverage of conditions and tissues, it captures a large majority of available brain-related datasets, accounting for 34% of its holdings. Users can access the curated data and differential expression analyses through the Gemma website, RESTful service, and an R package.Database URL: https://gemma.msl.ubc.ca/home.html

Publisher

Cold Spring Harbor Laboratory

Reference66 articles.

1. NCBI GEO: archive for functional genomics data sets—update

2. Meta-Analysis of Hypoxic Transcriptomes from Public Databases;Biomedicines,2020

3. Chen, H.-J. , Li Yim, A. Y. F. , Griffith, G. R. , et al. (2019) Meta-Analysis of in vitro-Differentiated Macrophages Identifies Transcriptomic Signatures That Classify Disease Macrophages in vivo. Front. Immunol., 10.

4. Genome-wide expression profiling of schizophrenia using a large combined cohort

5. PDAC-ANN: an artificial neural network to predict pancreatic ductal adenocarcinoma based on gene expression;BMC Cancer,2020

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Estimating and Correcting for Off-Target Cellular Contamination in Brain Cell Type Specific RNA-Seq Data;Frontiers in Molecular Neuroscience;2021-03-03