Abstract
AbstractBackgroundApproximately half of all high-grade serous ovarian carcinomas (HGSCs) have a therapeutically targetable defect in the homologous recombination (HR) DNA repair mechanism. While there are genomic and transcriptomic methods, developed for other cancer types, to identify HR deficient (HRD) samples, there are no gene expression-based tools to predict HR repair status in HGSC specifically. We have built the first HGSC-specific model to predict HR repair status using gene expression.MethodsWe separated The Cancer Genome Atlas (TCGA) cohort of HGSCs (n = 361) into training (n = 288) and testing (n = 73) sets and labelled each case as HRD or HR proficient (HRP) based on the clinical standard for classification, being a score of HRD genomic damage. Using the training set, we performed differential gene expression analysis between HRD and HRP cases. The 2604 significantly differentially expressed genes were then used to tune and train a penalised logistic regression model.ResultsIdentifiHR is an elastic net penalised logistic regression model that uses the expression of 209 genes to predict HR status in HGSC. These genes capture known regions of HR-specific copy number alteration, which impact gene expression levels, and preserve the genomic damage signal. IdentifiHR has an accuracy of 85% in the TCGA test set and of 91% in an independent cohort of 99 samples, collected from primary tumours before (n = 74/99) and after autopsy (n = 6/99), in addition to ascites (n = 12/99) and normal fallopian tube samples (n = 7/99). Further, IdentifiHR is 84% accurate in pseudobulked single-cell HGSC sequencing from 37 patients and outperforms existing gene expression-based methods to predict HR status, being BRCAness, MutliscaleHRD and expHRD.ConclusionsIdentifiHR is an accurate model to predict HR status in HGSC using gene expression alone, that is available as an R package fromhttps://github.com/DavidsonGroup/IdentifiHR.
Publisher
Cold Spring Harbor Laboratory