1. Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
2. Decoupled weight decay regularization;loshchilov;ArXiv Preprint,2017
3. Generation and Comprehension of Unambiguous Object Descriptions
4. Univl: A unified video and language pre-training model for multimodal understanding and generation;luo;ArXiv Preprint,2020
5. Mvitv 2: Improved multiscale vision transformers for classification and detection;li;Proc IEEE Conf Computer Vision and Pattern Recognition,0