1. Hassan Akbari , Linagzhe Yuan , Rui Qian , Wei-Hong Chuang , Shih-Fu Chang , Yin Cui , and Boqing Gong . 2021 . Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text . In Proceedings of the 35th Annual Conference on Neural Information Processing Systems. Hassan Akbari, Linagzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. In Proceedings of the 35th Annual Conference on Neural Information Processing Systems.
2. Pierre Baldi . 2012 . Autoencoders, unsupervised learning, and deep architectures . In Proceedings of ICML Workshop on Unsupervised and Transfer Learning. 37--49 . Pierre Baldi. 2012. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning. 37--49.
3. OpenFace 2.0: Facial Behavior Analysis Toolkit
4. IEMOCAP: interactive emotional dyadic motion capture database
5. Deep Adversarial Learning for Multi-Modality Missing Data Completion