Affiliation:
1. Department of Computer Science University of Calgary Calgary Alberta Canada
Abstract
AbstractHuman aesthetics play a significant role in video game development, emotional‐aware robot design, online recommender systems, digital human, and other domains of research focusing on human‐computer interactions. Social network user recognition based on aesthetic preferences is an emerging research domain. In this paper, a novel deep learning architecture is proposed for multi‐modal audio‐visual person identification that combines audio and visual aesthetic features. A pre‐trained ResNet architecture is utilized to extract high‐level features from a set of user‐preferred audio and image samples. A novel deep learning‐based fusion technique called residual‐aided intermediate fusion (RAIF) is introduced in order to effectively merge the audio and visual features. The proposed RAIF method achieved an accuracy of 98% and a loss of 0.01 on a proprietary multi‐modal dataset, indicating its effectiveness in fusing audio and visual information.
Funder
Natural Sciences and Engineering Research Council of Canada
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献