1. Understanding Humans in Crowded Scenes
2. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics,2019
3. Framewise phoneme classification with bidirectional LSTM and other neural network architectures
4. Very deep convolutional networks for large-scale image recognition;simonyan;arXiv preprint arXiv 1409 1556,2014
5. TIPCB: A Simple but Effective Part-based Convolutional Baseline for Text-based Person Search;chen;ArXiv,2021