Abstract
AbstractObjectiveThe identification/development of a machine learning (ML)-based classifier that utilizes metabolic profiles of serum samples to accurately identify individuals with ovarian cancer (OC).MethodsSerum samples collected from 431 OC patients and 133 normal women at four geographic locations were analyzed by mass spectrometry. Reliable metabolites were identified using recursive feature elimination (RFE) coupled with repeated cross-validation (CV) and used to develop a consensus classifier able to distinguish cancer from non-cancer. The probabilities assigned to individuals by the model were used to create a clinical tool that assigns a likelihood that an individual patient sample is cancer or normal.ResultsOur consensus classification model is able to distinguish cancer from control samples with 93% accuracy. The frequency distribution of individual patient scores was used to develop a clinical tool that assigns a likelihood that an individual patient does or does not have cancer.ConclusionsAn integrative approach using metabolomic profiles and ML-based classifiers has been employed to develop a clinical tool that assigns a probability that an individual patient does or does not have OC. This personalized/probabilistic approach to cancer diagnostics is more clinically informative and accurate than traditional binary (yes/no) tests and represents a promising new direction in the early detection of OC.HIGHLIGHTSPredictive models derived from machine learning (ML) analyses of serum metabolic profiles can accurately (PPV 93%) detect ovarian cancer (OC).Only a minority of the most predictively informative metabolites are currently annotated (7%).Lipids predominate among the most predictively informative metabolites currently annotated.The frequency distribution of model-derived patient scores can be used to develop a useful clinical tool for the diagnosis of OC.
Publisher
Cold Spring Harbor Laboratory