1. Recurrent models of visual attention;V Mnih;Proceedings of the 27th International Conference on Neural Information Processing Systems,2014
2. Vid2seq: Large-scale pretraining of a visual language model for dense video captioning;A Yang;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2023
3. Environment Knowledge-Driven Generic Models to Detect Coughs From Audio Recordings;Vhaduri Sudip;IEEE Open Journal of Engineering in Medicine and Biology,2023