Integrating Domain Knowledge in Multi-Source Classification Tasks
-
Published:2024-06-29
Issue:1
Volume:15
Page:591-614
-
ISSN:2763-7719
-
Container-title:Journal on Interactive Systems
-
language:
-
Short-container-title:JIS
Author:
Bender Alexandre ThurowORCID, Souza Emillyn Mellyne GobettiORCID, Bender Ihan BelmonteORCID, Corrêa Ulisses BrisolaraORCID, Araujo Ricardo MatsumuraORCID
Abstract
This work presents an extended investigation into multi-domain learning techniques within the context of image and audio classification, with a focus on the latter. In machine learning, collections of data obtained or generated under similar conditions are referred to as domains or data sources. However, the distinct acquisition or generation conditions of these data sources are often overlooked, despite their potential to significantly impact model generalization. Multi-domain learning addresses this challenge by seeking effective methods to train models to perform adequately across all domains seen during the training process. Our study explores a range of model-agnostic multi-domain learning techniques that leverage explicit domain information alongside class labels. Specifically, we delve into three distinct methodologies: a general approach termed Stew, which involves mixing all available data indiscriminately; and two batch domain-regularization methods: Balanced Domains and Loss Sum. These methods are evaluated through several experiments conducted on datasets featuring multiple data sources for audio and image classification tasks. Our findings underscore the importance of considering domain-specific information during the training process. We demonstrate that the application of the Loss Sum method yields notable improvements in model performance (0.79 F1-Score) compared to conventional approaches that blend data from all available domains (0.62 F1-Score). By examining the impact of different multi-domain learning techniques on classification tasks, this study contributes to a deeper understanding of effective strategies for leveraging domain knowledge in machine learning model training.
Publisher
Sociedade Brasileira de Computacao - SB
Reference44 articles.
1. Arpit, D., Wang, H., Zhou, Y., and Xiong, C. (2021). Ensemble of averages: Improving model selection and boosting performance in domain generalization. arXiv preprint arXiv:2110.10832. DOI: https://doi.org/10.48550/arXiv.2110.10832. 2. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. (2010). A theory of learning from different domains. Machine Learning, 79:151–175. DOI: https://doi.org/10.1007/s10994-009-5152-4. 3. Ben-David, S., Blitzer, J., Crammer, K., and Pereira, F. (2006). Analysis of representations for domain adaptation. Advances in neural information processing systems, 19. 4. Bender, A. T., Souza, E. M. G., Bender, I. B., Corrêa, U. B., and Araujo, R. M. (2023). Improving multi-domain learning by balancing batches with domain information. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web, pages 96–103. DOI: https://doi.org/10.1145/3617023.3617037. 5. Bender, I. B. (2022). Evaluating machine learning methodologies for multi-domain learning in image classification. Master’s thesis (computer science), Centro de Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas.
|
|