Author:
Pantelidis Nick,Pegia Maria,Galanopoulos Damianos,Apostolidis Konstantinos,Stavrothanasopoulos Klearchos,Moumtzidou Anastasia,Gkountakos Konstantinos,Gialampoukidis Ilias,Vrochidis Stefanos,Mezaris Vasileios,Kompatsiaris Ioannis,Jónsson Björn Þór
Publisher
Springer Nature Switzerland
Reference30 articles.
1. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv Preprint arXiv:2004.10934 (2020)
2. Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: ActivityNet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)
3. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
4. Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives. In: Proceedings of the British Machine Vision Conference (BMVC) (2018)
5. Galanopoulos, D., Mezaris, V.: Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc video search with dual encoding networks. In: Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR 2020). ACM (2020)