1. Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira, F., Burges, C.J., Bottou, L. and Weinberger, K.Q., Eds., Advances in Neural Information Processing Systems (NIPS), Curran Associates, Inc., 1097-1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
2. Wang, J., Yu, K., Dong, C., Loy, C.C. and Qiao, Y. (2020) Vision Transformers. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 1571-1580.
3. Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv: 2004.10934.
4. You Only Look Once: Unified, Real-Time Object Detection
5. CLDE-Net: crowd localization and density estimation based on CNN and transformer network