Uniform genomic data analysis in the NCI Genomic Data Commons-Reference-Cited by-同舟云学术

Uniform genomic data analysis in the NCI Genomic Data Commons

Published:2021-02-22 Issue:1 Volume:12 Page:
ISSN:2041-1723
Container-title:Nature Communications
language:en
Short-container-title:Nat Commun

Author:

Zhang Zhenyu^ORCID,Hernandez Kyle^ORCID,Savage Jeremiah^ORCID,Li Shenglai,Miller Dan^ORCID,Agrawal Stuti^ORCID,Ortuno Francisco^ORCID,Staudt Louis M.,Heath Allison^ORCID,Grossman Robert L.^ORCID

Abstract

AbstractThe goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/).

Publisher

Springer Science and Business Media LLC

Subject

General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry

Link

http://www.nature.com/articles/s41467-021-21254-9.pdf

Reference49 articles.

1. Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).

2. Heath, A. P., Ferretti, V., Staudt, L. & Grossman, R. L. The NCI Genomic Data Commons. Unpublished (2020).

3. Guo, Y. et al. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis. Genomics 109, 83–90 (2017).

4. Genovese, G. et al. Using population admixture to help complete maps of the human genome. Nat. Genet. 45, 406–414 (2013).

5. Van Doorslaer, K. et al. The Papillomavirus Episteme: a central resource for papillomavirus sequence data and analysis. Nucleic Acids Res. 41, D571–D578 (2012).

Cited by 77 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. End-to-end reproducible AI pipelines in radiology using the cloud;Nature Communications;2024-08-13

2. Integrated genomic/epigenomic analysis stratifies subtypes of clear cell ovarian carcinoma, highlighting their cellular origin;Scientific Reports;2024-08-13

3. Endoplasmic Reticulum Membrane Protein Complex Regulates Cancer Stem Cells and is Associated with Sorafenib Resistance in Hepatocellular Carcinoma;Journal of Hepatocellular Carcinoma;2024-08

4. Transcriptional programming mediated by the histone demethylase KDM5C regulates dendritic cell population heterogeneity and function;Cell Reports;2024-08

5. Multi-Omics Characterization of E3 Regulatory Patterns in Different Cancer Types;International Journal of Molecular Sciences;2024-07-11