Linear Classifiers Under Infinite Imbalance-Reference-Cited by-同舟云学术

Linear Classifiers Under Infinite Imbalance

Published:2023-12-21 Issue: Volume: Page:
ISSN:0030-364X
Container-title:Operations Research
language:en
Short-container-title:Operations Research

Author:

Glasserman Paul¹^ORCID,Li Mike¹^ORCID

Affiliation:

1. Columbia Business School, New York, New York 10027

Abstract

Understanding Linear Classifiers in the Face of Severe Data Imbalance In “Linear Classifiers Under Infinite Imbalance,” Paul Glasserman and Mike Li tackle the challenge of binary classification when data are severely imbalanced—a common dilemma in fields like healthcare and finance. They build upon the work of Owen by examining the behavior of logistic regression and extending the analysis to a broader class of linear discriminant functions. Their key contribution is the proof of infinite-imbalance limits for these functions’ coefficient vectors, providing explicit expressions for these limits and distinguishing between classifiers with subexponential and exponential weight functions. This distinction allows for a better understanding of how to adjust classifiers in the context of extreme imbalance, ultimately leading to improved specificity or sensitivity in predictions. The authors also link their findings to the concepts of robustness and conservatism in classification decisions, offering insight into optimal classifier design against the most challenging alternatives. The practical implications of their theoretical work are illustrated through numerical examples and a credit risk case study, offering a new perspective on managing classification tasks in the face of infinite imbalance.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications

Link

https://pubsonline.informs.org/doi/pdf/10.1287/opre.2021.0376

Reference21 articles.

1. SMOTE: Synthetic Minority Over-sampling Technique

2. $I$-Divergence Geometry of Probability Distributions and Minimization Problems

3. Credit Risk: Simple Closed-Form Approximate Maximum Likelihood Estimator