1. Agrawal, P., Carreira, J., & Malik, J. (2015). Learning to see by moving. In IEEE international conference on computer vision.
2. Andrew, G., Arora, R., Bilmes, J. A., & Livescu, K. (2013). Deep canonical correlation analysis. In International conference on machine learning.
3. Arandjelović, R., & Zisserman, A. (2017). Look, listen and learn. ICCV.
4. Aytar, Y., Vondrick, C., & Torralba, A. (2016). Soundnet: Learning sound representations from unlabeled video. In Advances in neural information processing systems.
5. Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying interpretability of deep visual representations. CVPR.