1. A. Baevski Y. Zhou A. Mohamed and M. Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems (NeurIPS 2020) Vol. 33. Virtual 12449–12460. A. Baevski Y. Zhou A. Mohamed and M. Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems (NeurIPS 2020) Vol. 33. Virtual 12449–12460.
2. Multimodal Machine Learning: A Survey and Taxonomy
3. Increasing the Reliability of Crowdsourcing Evaluations Using Online Quality Assessment
4. C. Busso , Z. Deng , S. Yildirim , M. Bulut , C.M. Lee , A. Kazemzadeh , S. Lee , U. Neumann , and S. Narayanan . 2004. Analysis of Emotion Recognition using Facial Expressions , Speech and Multimodal Information. In Sixth International Conference on Multimodal Interfaces ICMI 2004 . ACM Press, State College, PA, 205–211. https://doi.org/10.1145/1027933.1027968 10.1145/1027933.1027968 C. Busso, Z. Deng, S. Yildirim, M. Bulut, C.M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. 2004. Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. In Sixth International Conference on Multimodal Interfaces ICMI 2004. ACM Press, State College, PA, 205–211. https://doi.org/10.1145/1027933.1027968
5. C. Busso and S.S. Narayanan . 2006. Interplay between linguistic and affective goals in facial expression during emotional utterances. In 7th International Seminar on Speech Production (ISSP 2006 ). Ubatuba-SP, Brazil, 549–556. C. Busso and S.S. Narayanan. 2006. Interplay between linguistic and affective goals in facial expression during emotional utterances. In 7th International Seminar on Speech Production (ISSP 2006). Ubatuba-SP, Brazil, 549–556.