1. Multi-view learning review: understanding methods and their application;Bae;The Korean Journal of Applied Statistics,2019
2. Integrating scene text and visual appearance for fine-grained image classification;Bai;IEEE Access,2018
3. Multimodal machine learning: a survey and taxonomy;Baltrušaitis;IEEE Transactions on Pattern Analysis and Machine Intelligence,2019
4. Imagenet: a large-scale hierarchical image database;Deng,2009
5. Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al. (2013). Devise: A deep visual- semantic embedding model. In advances in neural information processing systems, (pp. 2121–2129). http://papers.nips.cc/paper/5204-devise-a-deep-visual-sem.