Integrating Domain Knowledge in Multi-Source Classification Tasks

Author:

Bender Alexandre ThurowORCID,Souza Emillyn Mellyne GobettiORCID,Bender Ihan BelmonteORCID,Corrêa Ulisses BrisolaraORCID,Araujo Ricardo MatsumuraORCID

Abstract

This work presents an extended investigation into multi-domain learning techniques within the context of image and audio classification, with a focus on the latter. In machine learning, collections of data obtained or generated under similar conditions are referred to as domains or data sources. However, the distinct acquisition or generation conditions of these data sources are often overlooked, despite their potential to significantly impact model generalization. Multi-domain learning addresses this challenge by seeking effective methods to train models to perform adequately across all domains seen during the training process. Our study explores a range of model-agnostic multi-domain learning techniques that leverage explicit domain information alongside class labels. Specifically, we delve into three distinct methodologies: a general approach termed Stew, which involves mixing all available data indiscriminately; and two batch domain-regularization methods: Balanced Domains and Loss Sum. These methods are evaluated through several experiments conducted on datasets featuring multiple data sources for audio and image classification tasks. Our findings underscore the importance of considering domain-specific information during the training process. We demonstrate that the application of the Loss Sum method yields notable improvements in model performance (0.79 F1-Score) compared to conventional approaches that blend data from all available domains (0.62 F1-Score). By examining the impact of different multi-domain learning techniques on classification tasks, this study contributes to a deeper understanding of effective strategies for leveraging domain knowledge in machine learning model training.

Publisher

Sociedade Brasileira de Computacao - SB

Reference44 articles.

1. Arpit, D., Wang, H., Zhou, Y., and Xiong, C. (2021). Ensemble of averages: Improving model selection and boosting performance in domain generalization. arXiv preprint arXiv:2110.10832. DOI: https://doi.org/10.48550/arXiv.2110.10832.

2. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. (2010). A theory of learning from different domains. Machine Learning, 79:151–175. DOI: https://doi.org/10.1007/s10994-009-5152-4.

3. Ben-David, S., Blitzer, J., Crammer, K., and Pereira, F. (2006). Analysis of representations for domain adaptation. Advances in neural information processing systems, 19.

4. Bender, A. T., Souza, E. M. G., Bender, I. B., Corrêa, U. B., and Araujo, R. M. (2023). Improving multi-domain learning by balancing batches with domain information. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web, pages 96–103. DOI: https://doi.org/10.1145/3617023.3617037.

5. Bender, I. B. (2022). Evaluating machine learning methodologies for multi-domain learning in image classification. Master’s thesis (computer science), Centro de Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3