1. Chengxin Chen , Meng Wang , and Pengyuan Zhang . 2022. Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy. arXiv preprint arXiv:2204.11420 ( 2022 ). Chengxin Chen, Meng Wang, and Pengyuan Zhang. 2022. Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy. arXiv preprint arXiv:2204.11420 (2022).
2. François Chollet 2015. Keras. https://keras.io. François Chollet 2015. Keras. https://keras.io.
3. Joon Son Chung , A. Senior , Oriol Vinyals , and Andrew Zisserman . 2017 . Lip Reading Sentences in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3444–3453 . Joon Son Chung, A. Senior, Oriol Vinyals, and Andrew Zisserman. 2017. Lip Reading Sentences in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3444–3453.
4. Detection , Classification of Acoustic Scenes, and Events Community . 2021 . DCASE Challenges Task 1A. http://dcase.community/challenge2021. Detection, Classification of Acoustic Scenes, and Events Community. 2021. DCASE Challenges Task 1A. http://dcase.community/challenge2021.
5. ActivityNet: A large-scale video benchmark for human activity understanding