1. Joint Visual and Audio Learning for Video Highlight Detection
2. Contrastive Learning for Unsupervised Video Highlight Detection
3. Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , 2020. Language models are few-shot learners. Advances in neural information processing systems 33 ( 2020 ), 1877–1901. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
4. Ting Chen , Simon Kornblith , Mohammad Norouzi , and Geoffrey Hinton . 2020 . A simple framework for contrastive learning of visual representations . In International conference on machine learning. PMLR, 1597–1607 . Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
5. Peng Gao , Shijie Geng , Renrui Zhang , Teli Ma , Rongyao Fang , Yongfeng Zhang , Hongsheng Li , and Yu Qiao . 2021 . Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544 (2021). Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. 2021. Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544 (2021).