Abstract
AbstractActive learning allows machine learning models to be trained using fewer labels while retaining similar performance to traditional supervised learning. An active learner selects the most informative data points, requests their labels, and retrains itself. While this approach is promising, it raises the question of how to determine when the model is ‘good enough’ without the additional labels required for traditional evaluation. Previously, different stopping criteria have been proposed aiming to identify the optimal stopping point. Yet, optimality can only be expressed as a domain-dependent trade-off between accuracy and the number of labels, and no criterion is superior in all applications. As a further complication, a comparison of criteria for a particular real-world application would require practitioners to collect additional labelled data they are aiming to avoid by using active learning in the first place. This work enables practitioners to employ active learning by providing actionable recommendations for which stopping criteria are best for a given real-world scenario. We contribute the first large-scale comparison of stopping criteria for pool-based active learning, using a cost measure to quantify the accuracy/label trade-off, public implementations of all stopping criteria we evaluate, and an open-source framework for evaluating stopping criteria. Our research enables practitioners to substantially reduce labelling costs by utilizing the stopping criterion which best suits their domain.
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Software
Reference31 articles.
1. Anguita, D., Ghio, A., Oneto, L., et al. (2013). A public domain dataset for human activity recognition using smartphones. In Proceedings of the 21th international European symposium on artificial neural networks, computational intelligence and machine learning (pp. 437–442).
2. Beatty, G., Kochis, E., & Bloodgood, M. (2019). The use of unlabeled data versus labeled data for stopping active learning for text classification. In 2019 IEEE 13th international conference on semantic computer (ICSC) (pp. 287–294). https://doi.org/10.1109/ICOSC.2019.8665546
3. Bloodgood, M., & Vijay-Shanker, K. (2009). A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping. In Proceedings of the 13th international conference on natural language processing. Linguistics, USA, CoNLL ’09 (pp. 39–47).
4. Blumen, H., Fitch, K., & Polkus, V. (2016). Comparison of treatment costs for breast cancer, by tumor stage and type of service. Amer, Health and Drug Benefits, 9(1), 23–32.
5. Callaghan, M. W., & Müller-Hansen, F. (2020). Statistical stopping criteria for automated screening in systematic reviews. Systematic Reviews, 9(1), 273. https://doi.org/10.1186/s13643-020-01521-4.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献