A New Smooth Approximation to the Zero One Loss with a Probabilistic Interpretation-Reference-Cited by-同舟云学术

A New Smooth Approximation to the Zero One Loss with a Probabilistic Interpretation

Published:2020-02-04 Issue:1 Volume:14 Page:1-28
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Hasan Md Kamrul¹,Pal Christopher²

Affiliation:

1. École Polytechnique Montréal, QC, Canada

2. Mila, École Polytechnique Montréal, QC, Canada

Abstract

We examine a new form of smooth approximation to the zero one loss in which learning is performed using a reformulation of the widely used logistic function. Our approach is based on using the posterior mean of a novel generalized Beta-Bernoulli formulation. This leads to a generalized logistic function that approximates the zero one loss, but retains a probabilistic formulation conferring a number of useful properties. The approach is easily generalized to kernel logistic regression and easily integrated into methods for structured prediction. We present experiments in which we learn such models using an optimization method consisting of a combination of gradient descent and coordinate descent using localized grid search so as to escape from local minima. Our experiments indicate that optimization quality is improved when learning metaparameters are themselves optimized using a validation set. Our experiments show improved performance relative to widely used logistic and hinge loss methods on a wide variety of problems ranging from standard UC Irvine and libSVM evaluation datasets to product review predictions and a visual information extraction task. We observe that the approach is as follows: (1) more robust to outliers compared to the logistic and hinge losses; (2) outperforms comparable logistic and max margin models on larger scale benchmark problems; (3) when combined with Gaussian–Laplacian mixture prior on parameters the kernelized version of our formulation yields sparser solutions than Support Vector Machine classifiers; and (4) when integrated into a probabilistic structured prediction technique our approach provides more accurate probabilities yielding improved inference and increasing information extraction performance.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3365672

Reference36 articles.

1. K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml. K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.

2. Trading convexity for scalability

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. EXACT: How to train your accuracy;Pattern Recognition Letters;2024-09

2. Bibliography;Machine Learning with Noisy Labels;2024

3. Theoretical aspects of noisy-label learning;Machine Learning with Noisy Labels;2024

4. A nonlinear sparse neural ordinary differential equation model for multiple functional processes;Canadian Journal of Statistics;2021-11-16