1. Ouluvs2: a multi-view audiovisual database for non-rigid mouth motion analysis;Anina,2015
2. Assael, Y. M., Shillingford, B., Whiteson, S., de Freitas, N., 2016. Lipnet: Sentence-level lipreading. Arxiv:1611.01599.
3. Learning sign language by watching TV (using weakly aligned subtitles);Buehler,2009
4. Chakravarty, P., Tuytelaars, T., 2016. Cross-modal supervision for learning active speaker detection in video. Arxiv:1603.08907.
5. Return of the devil in the details: delving deep into convolutional nets;Chatfield,2014