1. Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.
2. Bhat, G., Danelljan, M., Gool, L. V. & Timofte, R. (2019). Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6182–6191).
3. Bhat, S. F., Birkl, R., Wofk, D., Wonka, P. & Müller, M. (2023). Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 pp 01–20.
4. Camplani, M., Hannuna, S. L., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., & Burghardt, T. (2015). Real-time RGB-D tracking with depth scaling kernelised correlation filters and occlusion handling. BMVC, 3, 01–12.
5. Chang, A., Dai, A., Funkhouser, T., Halber, M., Niebner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3d: Learning from RGB-D data in indoor environments. In 2017 International conference on 3D vision (3DV), IEEE Computer Society (pp. 667–676).