BACKGROUND
To reduce the mortality induced by bladder-cancer, efforts need to be concentrated on early detection of the disease for more effective therapeutic intervention. Strong risk factors have been identified (e.g., smoking status, age, professional exposure…) and some diagnostic tools (e.g., by the mean of cystoscopy) were proposed. However, to date, no full-satisfactory (non-invasive, inexpensive, high performance) solution for widespread deployment has yet been proposed. Some new models based on cytology images classification have been recently developed and bring good perspectives but there are still avenues to explore to improve their performance.
OBJECTIVE
Our team aimed to evaluate the benefit of combining massive clinical data reuse to build a risk factor model and a digital cytology image-based model for bladder cancer detection
METHODS
First step relied on the designing of a predictive model based on clinical data (i.e., risk factors identified in the literature) extracted from the Clinical Data Warehouse of the Rennes Hospital and machine learning algorithms (Logistic Regression, Random Forest and Support Vector Machine). It provides a score corresponding to the risk of developing bladder cancer based on patient clinical profile. Secondly, we investigated three strategies (Logistic Regression, Decision Tree and a Custom proposal based on scores interpretation) to combine its score with the ones of a image-based model to produce a robust bladder-cancer scoring.
RESULTS
Two datasets were collected. The first one, including clinical data of 5422 patients extracted from the Clinical Data Warehouse was used to design the risk factor-based model. The second one was used for measuring the models' performances and was composed of 651 patients from a clinical trial for which cytology images were collected along with clinico-biological features. On this second dataset, the combination of both models obtains an AUC of 0.81 on train and 0.83 on test sets, demonstrating the interest of combining risk factor-based and image-based models. We have seen that it offers a higher associated risk of cancer than VisioCyt for all classes, especially for low-grade bladder cancer.
CONCLUSIONS
These results demonstrate the value of combining clinical and biological information, especially to improve detection of low-grade bladder cancer patients. Some improvements will need to be made to the automatic extraction of clinical features to make the risk factor-based model more robust. However, as of now, they support the assumption that this type of approach will be of benefit to patients.