1. Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Visual Comput. 10, 1–32 (2021)
2. Cai, L., Wang, Z., Gao, H., Shen, D., Ji, S.: Deep adversarial learning for multi-modality missing data completion. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1158–1166 (2018)
3. Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5(4), 377–390 (2014)
4. Das, R.K., Tao, R., Yang, J., Rao, W., Yu, C., Li, H.: HLT-NUS submission for 2019 NIST multimedia speaker recognition evaluation. In: Proceedings of the APSIPA, Annual Summit and Conference, pp. 605–609, December 2020
5. Du, C., et al.: Semi-supervised deep generative modelling of incomplete multi-modality emotional data. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 108–116, October 2018