1. [1] E. H. Adelson and J. R. Bergen, "Spatiotemporal Receptive Fields of Simultaneously Tuned Neurons in the Retina," Journal of the Optical Society of America, vol. 2, no. 9, pp. 1131–1138, 1985.
2. [2] M. H. Ma and R. Jain, "A Model for Temporal Video Summarization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 744–749, 1996.
3. [3] J. S. Smith and S.-F. Chang, "Visual Storytelling: A Temporal Video Summarization Approach," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1002–1009, 2005.
4. [4] M. R. Naphade and J. S. Smith, "A Model of Video Summarization Using Keyframes and Associated Meta Data," IEEE Transactions on Multimedia, vol. 7, no. 1, pp. 109–122, 2005.
5. [5] K. Simonyan and A. Zisserman, "Two-Stream Convolutional Networks for Action Recognition in Videos," Advances in Neural Information Processing Systems, vol. 27, 2014.