The Open Pediatric Cancer Project
Author:
Geng ZhuangzhuangORCID, Wafula EricORCID, Corbett Ryan J.ORCID, Zhang Yuanchao, Jin RunORCID, Gaonkar Krutika S.ORCID, Shukla SangeetaORCID, Rathi Komal S.ORCID, Hill DaveORCID, Lahiri AdityaORCID, Miller Daniel P.ORCID, Sickler AlexORCID, Keith KelseyORCID, Blackden ChristopherORCID, Chroni Antonia, Brown Miguel A.ORCID, Kraya Adam A.ORCID, Koschmann Carl J.ORCID, Aldape Kenneth, Huang XiaoyanORCID, Rood Brian R., Mason Jennifer L., Trooskin Gerri R., Abdullaev Zied, Wang Pei, Zhu YuankunORCID, Farrow Bailey K.ORCID, Farrel AlvinORCID, Dybas Joseph M., Zhong ChuweiORCID, Van Kuren NicholasORCID, Zhang BoORCID, Santi MariaritaORCID, Phul SakshamORCID, Chinwalla Asif TORCID, Resnick Adam C.ORCID, Diskin Sharon J.ORCID, Tasian Sarah, Stefankiewicz Stephanie, Maris John M., Ennis Brian M.ORCID, Lueder Matthew R.ORCID, Naqvi Ammar S., Coleman NoelORCID, Ma Weiping, Taylor DeanneORCID, Rokita Jo LynneORCID
Abstract
AbstractBackgroundIn 2019, the Open Pediatric Brain Tumor Atlas (OpenPBTA) was created as a global, collaborative open-science initiative to genomically characterize 1,074 pediatric brain tumors and 22 patient-derived cell lines. Here, we extend the OpenPBTA to create the Open Pediatric Cancer (OpenPedCan) Project, a harmonized open-source multi-omic dataset from 6,112 pediatric cancer patients with 7,096 tumor events across more than 100 histologies. Combined with RNA-Seq from the Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA), OpenPedCan contains nearly 48,000 total biospecimens (24,002 tumor and 23,893 normal specimens).FindingsWe utilized Gabriella Miller Kids First (GMKF) workflows to harmonize WGS, WXS, RNA-seq, and Targeted Sequencing datasets to include somatic SNVs, InDels, CNVs, SVs, RNA expression, fusions, and splice variants. We integrated summarized CPTAC whole cell proteomics and phospho-proteomics data, miRNA-Seq data, and have developed a methylation array harmonization workflow to include m-values, beta-vales, and copy number calls. OpenPedCan contains reproducible, dockerized workflows in GitHub, CAVATICA, and Amazon Web Services (AWS) to deliver harmonized and processed data from over 60 scalable modules which can be leveraged both locally and on AWS. The processed data are released in a versioned manner and accessible through CAVATICA or AWS S3 download (from GitHub), and queryable through PedcBioPortal and the NCI’s pediatric Molecular Targets Platform. Notably, we have expanded PBTA molecular subtyping to include methylation information to align with the WHO 2021 Central Nervous System Tumor classifications, allowing us to create research-grade integrated diagnoses for these tumors.ConclusionsOpenPedCan data and its reproducible analysis module framework are openly available and can be utilized and/or adapted by researchers to accelerate discovery, validation, and clinical translation.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|