Abstract
AbstractObjectivesAutomated classification of flow cytometry data has the potential to reduce errors and accelerate flow cytometry interpretation. We desired a machine learning approach that is accurate, intuitively easy to understand, and highlights the cells that are most important in the algorithm’s prediction for a given case.MethodsWe developed an ensemble of convolutional neural networks (CNNs) for classification and visualization of impactful cell populations in detecting classic Hodgkin lymphoma, using two-dimensional (2D) histograms. Data from 977 and 245 clinical flow cytometry cases were used for training and testing, respectively. 78 non-gated 2D histograms were created per flow cytometry file. SHAP values were calculated to determine the most impactful 2D histograms and regions within the histograms. The SHAP values from all 78 histograms were then projected back to the original cells data for gating and visualization using standard flow cytometry software.ResultsThe algorithm achieved 67.7% recall (sensitivity), 82.4 % precision, and 0.92 AUROC. Visualization of the important cell populations in making individual predictions demonstrated correlations with known biology.ConclusionsThe method presented enables model explainability while highlighting important cell populations in individual flow cytometry specimens, with potential applications in both diagnosis and discovery of previously overlooked key cell populations.
Publisher
Cold Spring Harbor Laboratory