Abstract
AbstractMulti-modal learning has gained significant attention due to its ability to enhance machine learning algorithms. However, it brings challenges related to modality heterogeneity and domain shift. In this work, we address these challenges by proposing a new approach called Relative Norm Alignment (RNA) loss. RNA loss exploits the observation that variations in marginal distributions between modalities manifest as discrepancies in their mean feature norms, and rebalances feature norms across domains, modalities, and classes. This rebalancing improves the accuracy of models on test data from unseen (“target”) distributions. In the context of Unsupervised Domain Adaptation (UDA), we use unlabeled target data to enhance feature transferability. We achieve this by combining RNA loss with an adversarial domain loss and an Information Maximization term that regularizes predictions on target data. We present a comprehensive analysis and ablation of our method for both Domain Generalization and UDA settings, testing our approach on different modalities for tasks such as first and third person action recognition, object recognition, and fatigue detection. Experimental results show that our approach achieves competitive or state-of-the-art performance on the proposed benchmarks, showing the versatility and effectiveness of our method in a wide range of applications.
Publisher
Springer Science and Business Media LLC
Reference96 articles.
1. Aakerberg, A., Nasrollahi, K., & Heder, T. (2017). Improving a deep learning based RGB-D object recognition model by ensemble learning. In 2017 Seventh international conference on image processing theory, tools and applications (IPTA) (pp. 1–6). IEEE.
2. Agarwal, N., Chen, Y.-T., Dariush, B., & Yang, M.-H. (2020). Unsupervised domain adaptation for spatio-temporal action localization. arXiv:2010.09211.
3. Balaji, Y., Sankaranarayanan, S., & Chellappa, R. (2018). Metareg: Towards domain generalization using meta-regularization. In NeurIPS.
4. Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2019). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.
5. Barbato, F., Toldo, M., Michieli, U. & Zanuttigh, P. (2021). Latent space regularization for unsupervised domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops (pp. 2835–2845).