1. Baltrusaitis, T., Ahuja, C., Morency, L.: Multimodal Machine Learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2019). https://doi.org/10.1109/TPAMI.2018.2798607
2. Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38(8), 2939–2970 (2022). https://doi.org/10.1007/s00371-021-02166-7
3. Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: Owen, G.S., Whitted, T., Mones-Hattal, B. (eds.) Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1997, Los Angeles, CA, USA, 3–8 August 1997, pp. 353–360. ACM, New York (1997). https://doi.org/10.1145/258734.258880
4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
5. Cao, Y., Steffey, S., He, J., Xiao, D., Tao, C., Chen, P., Müller, H.: Medical image retrieval: a multimodal approach. Cancer Inform. 13s3, CIN.S14053 (2014). https://doi.org/10.4137/CIN.S14053. PMID: 26309389