Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data-Reference-Cited by-同舟云学术

Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data

Published:2022-07-20 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Thölke Philipp^ORCID,Mantilla-Ramos Yorguin-Jose^ORCID,Abdelhedi Hamza,Maschke Charlotte,Dehgan Arthur,Harel Yann^ORCID,Kemtur Anirudha,Berrada Loubna Mekki,Sahraoui Myriam,Young Tammy,Bellemare Pépin Antoine^ORCID,El Khantour Clara^ORCID,Landry Mathieu^ORCID,Pascarella Annalisa^ORCID,Hadid Vanessa^ORCID,Combrisson Etienne,O’Byrne Jordan,Jerbi Karim^ORCID

Abstract

AbstractMachine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG) and magnetoencephalography (MEG). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric – defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.

Publisher

Cold Spring Harbor Laboratory

Reference80 articles.

1. Neuroscience-Inspired Artificial Intelligence

2. Natural and artificial intelligence: A brief introduction to the interplay between ai and neuroscience research;Neural Networks,2021

3. A deep learning framework for neuroscience

4. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines

5. The mutual inspirations of machine learning and neuroscience;Neuron,2015

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Unravelling the neural dynamics of hypnotic susceptibility: Aperiodic neural activity as a central feature of hypnosis;2023-11-17

2. Differential Patterns of Associations within Audiovisual Integration Networks in Children with ADHD;2023-09-27

3. Phase prediction and experimental realisation of a new high entropy alloy using machine learning;Scientific Reports;2023-03-23

4. Three simple steps to improve the interpretability of EEG-SVM studies;Journal of Neurophysiology;2022-12-01