1. Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems 29 (2016)
2. Bellini, R., Kleiman, Y., Cohen-Or, D.: Dance to the beat: Synchronizing motion to audio. Comput. Visual Media 4(3), 197–208 (2018)
3. Chen, Q., Wu, Q., Chen, J., Wu, Q., van den Hengel, A., Tan, M.: Scripted video generation with a bottom-up generative adversarial network. IEEE Trans. Image Process. 29, 7454–7467 (2020)
4. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
5. Davis, A., Agrawala, M.: Visual rhythm and beat. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2532–2535 (2018)