1. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
2. MOT15 results. https://motchallenge.net/results/MOT15/. Accessed 20 Sept 2022
3. Kolve, E., et al.: AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2017)
4. Batra, D., et al.: Rearrangement: a challenge for embodied AI. arXiv preprint arXiv:2011.01975 (2020)
5. Hall, D., et al.: The robotic vision scene understanding challenge. arXiv preprint arXiv:2009.05246 (2020)