1. Aytar Y, Vondrick C, Torralba A (2016) Soundnet: Learning sound representations from unlabeled video[C]. Advances in Neural Information Processing Systems:892–900
2. Shaun Barry and Youngmoo Kim, Style transfer for musical audio using multiple time-frequency representations, Unpublished article available at: https://tinyurl.com/y7nu7r9s, 2018.
3. Brunner G, Konrad A, Wang Y, et al. MIDI-VAE: Modeling dynamics and instrumentation of music with applications to style transfer[J]. arXiv preprint arXiv:1809.07600, 2018.
4. Ephrat A, Mosseri I, Lang O et al (2018) Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation. ACM T Graphic. https://doi.org/10.1145/3197517.3201357
5. Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015.