Abstract
AbstractIn this chapter we discuss the experimental evaluation of quantification systems. We look at evaluation measures for the various types of quantification systems (binary, single-label multiclass, multi-label multiclass, ordinal), but also at evaluation protocols for quantification, that essentially consist in ways to extract multiple testing samples for use in quantification evaluation from a single classification test set. The chapter ends with a discussion on how to perform model selection (i.e., hyperparameter optimization) in a quantification-specific way.
Publisher
Springer International Publishing
Reference165 articles.
1. Alaíz-Rodríguez, R., Guerrero-Curieses, A., and Cid-Sueiro, J. (2011). Class and subclass probability re-estimation to adapt a classifier in the presence of concept drift. Neurocomputing, 74(16):2614–2623.
2. Alexandari, A., Kundaje, A., and Shrikumar, A. (2020). Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), pages 222–232, Vienna, AT.
3. Anderson, T. W. (1962). On the distribution of the two-sample Cramer-von Mises criterion. The Annals of Mathematical Statistics, 33(3):1148–1159.
4. Andrus, M., Spitzer, E., Brown, J., and Xiang, A. (2021). What we can’t measure, we can’t understand: Challenges to demographic data procurement in the pursuit of fairness. In Proceedings of the 4th ACM Conference on Fairness, Accountability, and Transparency (FAccT 2021), pages 249–260, Toronto, CA.
5. Arribas, J. I. and Cid-Sueiro, J. (2005). A model selection algorithm for a posteriori probability estimation with neural networks. IEEE Transactions on Neural Networks, 16(4):799–809.