Abstract
AbstractAudio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods, as well as the remaining challenges of each subfield, are further discussed. Finally, we summarize the commonly used datasets and challenges.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Modelling and Simulation,Control and Systems Engineering
Reference168 articles.
1. R. V. Shannon, F. G. Zeng, V. Kamath, J. Wygonski, M. Ekelid. Speech recognition with primarily temporal cues. Science, vol. 270, no. 5234, pp. 303–304, 1995. DOI: https://doi.org/10.1126/science.270.5234.303.
2. G. Krishna, C. Tran, J. G. Yu, A. H. Tewfik. Speech recognition with no speech or with noisy speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Brighton, UK, pp. 1090–1094, 2019. DOI: https://doi.org/10.1109/ICASSP.2019.8683453.
3. R. He, W. S. Zheng, B. G. Hu. Maximum correntropy criterion for robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1561–1576, 2011. DOI: https://doi.org/10.1109/TPAMI.2010.220.
4. C. Y. Fu, X. Wu, Y. B. Hu, H. B. Huang, R. He. Dual variational generation for low shot heterogeneous face recognition. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 2670–2679, 2019.
5. S. G. Tong, Y. Y. Huang, Z. M. Tong. A robust face recognition method combining lbp with multi-mirror symmetry for images with various face interferences. International Journal of Automation and Computing, vol. 16, no. 5, pp. 671–682, 2019. DOI: https://doi.org/10.1007/s11633-018-1153-8.
Cited by
85 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献