Abstract
Background: With continued improvements in breast cancer (BC) outcomes and risk of recurrence occurring until at least 20 years post-diagnosis, it is important to continue to follow-up clinical trial participants to characterise long-term treatment impact. Traditionally follow-up has been via hospitals; entailing burden on patients and site-staff. Using routinely collected health datasets (RCHD) as an alternative method is attractive, but historically cancer recurrence is poorly recorded unlike initial cancer diagnosis. Here we use data collected prospectively from large, multi-centre BC clinical trials to develop and test a procedure to identify recurrence within RCHD.
Methods: Data from four trials of early breast cancer (TACT2, POETIC, IMPORT-HIGH and FAST-Forward) where recurrence data has been collected prospectively (gold standard) was linked with RCHD (incl. cancer registry and hospital episode statistics; HES) managed by NHS England. The procedure identified episodes of clinical activity within RCHD to classify each event type (local and distant recurrence, second cancers, death) separately then combined to derive time-to-recurrence (TTR), disease-free survival (iDFS) and overall survival (OS) outcomes. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated. Hazard ratios using Cox regression modelling, log rank test p-values and three-year survival-rates for the randomised treatments were reported separately for RCHD and trial data.
Results:
The final procedure used Cancer Registry diagnoses to identify initial BCs for quality control purposes and second primary cancers. Deaths were identified via death dates and cause. Distant recurrence was identified predominately by direct indicators of metastases (e.g. ICD10 codes C77X-79X). Local recurrence was identified via relevant surgeries’ OPCS4 codes. For TTR, iDFS and OS, agreement between study and RCHD events was reasonable. Specificity was good across all endpoints (range:97.9%-99.9% for three training datasets combined), as was NPV (range:95.2%-99.6%). Sensitivity and PPV were more variable with sensitivity ranging between 72.9%-97.2% and PPV ranging between 82.6%-99.5%. Values were similar when considering the test dataset. Survival estimates for TTR, iDFS and OS were similar between study and RCHD data.
Conclusion:It is possible, with reasonable accuracy, to identify cancer recurrences using RCHD in the place of hospital-based data collection after the point of primary analysis.