1. Agostinelli A, Denk TI, Borsos Z, et al., 2023. MusicLM: generating music from text. https://arxiv.org/abs/2301.11325
2. Al-Rfou R, Choe D, Constant N, et al., 2019. Character-level language modeling with deeper self-attention. 33rd AAAI Conf on Artificial Intelligence, p.3159–3166. https://doi.org/10.1609/AAAI.V33I01.33013159
3. Ao JY, Wang R, Zhou L, et al., 2022. SpeechT5: unified-modal encoder-decoder pre-training for spoken language processing. Proc 60th Annual Meeting of the Association for Computational Linguistics, p.5723–5738. https://doi.org/10.18653/V1/2022.ACL-LONG.393
4. Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159.
5. Coldewey D, 2022. Try Riffusion, an AI Model That Composes Music by Visualizing It. https://techcrunch.com/2022/12/15/try-riffusion-an-ai-model-that-composes-music-by-visualizing-it/ [Accessed on Apr. 6, 2024].