1. Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. VATT: Transformers for multimodal self-supervised learning from raw video, audio and text. In Advances in Neural Information Processing Systems. https://openreview.net/forum?id=RzYrn625bu8
2. Multimodal Machine Learning: A Survey and Taxonomy
3. Daniel Cer Yinfei Yang Sheng yi Kong Nan Hua Nicole Limtiaco Rhomni St. John Noah Constant Mario Guajardo-Cespedes Steve Yuan Chris Tar Yun-Hsuan Sung Brian Strope and Ray Kurzweil. 2018. Universal Sentence Encoder. arxiv:1803.11175 [cs.CL]
4. A joint learning Im-BiLSTM model for incomplete time-series sentinel-2A data imputation and crop classification;Chen Baili;International Journal of Applied Earth Observation and Geoinformation,2022
5. Self-training method based on GCN for semi-supervised short text classification