1. Owens A, Isola P, McDermott J, et al. Visually indicated sounds[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2016: 2405-2413.
2. Zhou Y Q, Wang Z W, Fang C, et al. Visual to sound: generating natural sound for videos in the wild[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2018: 3550-3558.
3. Ghose S, Prevost J J. AutoFoley: artificial synthesis of synchronized sound tracks for silent videos with deep learning. IEEE Transactions on Multimedia, 2021, 23: 1895-1907.
4. Iashin V, Rahtu E. Taming visually guided sound generation[EB/OL]. [2022-05-26]. https://arxiv.org/abs/2110.08791.
5. Cheng Haonan, Li Sijia, Liu Shiguang. Deep cross-modal synthesis of environmental sound. Journal of Computer-Aided Design & Computer Graphics, 2019, 31: 2047-2055.