1. Alías, F., Socoró, J. C., & Sevillano, X. (2016). A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Applied Sciences 6(5):143
2. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
3. Bertin-Mahieux, T., Eck, D., & Mandel, M. I. (2011). Automatic Tagging of Audio: The State-of-the-Art. Machine audition: Principles, algorithms and systems, IGI Global.
4. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
5. Choi K (2018) Deep neural networks for music tagging. Queen Mary University of London