Abstract
AbstractGlioblastoma multiforme is the most common form of brain cancer. Several lines of evidence suggest that glioblastoma multiforme has a genetic basis. A genetic test that could identify people who are at high risk of developing glioblastoma multiforme could improve our understanding of this form of brain cancer.Using the Cancer Genome Atlas (TCGA) dataset, we found common germ line DNA copy number variations in the TCGA population. We tested whether different sets of these germ line DNA copy number variations could effectively distinguish patients with glioblastoma multiforme from others in the TCGA dataset. We used a gradient boosting machine, a machine learning classification algorithm, to classify TCGA patients solely based on a set of germline DNA copy number variations.We found that this machine learning algorithm could classify TCGA glioblastoma multiforme patients from the other TCGA patients with an area under the curve (AUC) of the receiver operating characteristic curve (AUC=0.875). Grouped into quintiles, the highest ranked quintile by the machine learning algorithm had an odds ratio of 3.78 (95% CI 3.25-4.40) higher than the average odds ratio and about 40 (95% CI 20-70) times higher than the lowest quintile.The identification of an effective germ line genetic test to stratify risk of developing glioblastoma multiforme should lead to a better understanding of how this cancer forms. This result might ultimately lead to better treatments of glioblastoma multiforme.
Publisher
Cold Spring Harbor Laboratory