Abstract
AbstractThere is currently no method to distinguish between germline and somatic structural variants (SVs) in tumor samples that lack a matched normal sample. In this study, we analyzed several features of germline and somatic SVs from a cohort of 974 patients from The Cancer Genome Atlas (TCGA). We identified a total of 21 features that differed significantly between germline and somatic SVs. Several of the germline SV features were associated with each other, as were several of the somatic SV features. We also found that these associations differed between the germline and somatic classes, for example, we found that somatic inversions were more likely to be longer events than their germline counterparts. Using these features we trained a support vector machine (SVM) classifier on 555,849 TCGA SVs to computationally distinguish germline from somatic SVs in the absence of a matched normal. This classifier had an ROC curve AUC of 0.984 when tested on an independent test set of 277,925 TCGA SVs. In this dataset, we achieved a positive predictive value (PPV) of 0.81 for an SV called somatic by the classifier being truly somatic. We further tested the classifier on a separate set of 7,623 SVs from pediatric high-grade gliomas (pHGG). In this non-TCGA cohort, our classifier achieved a PPV of 0.828, showing robust performance across datasets.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献