1. Anderson, R., Stenger, B., Wan, V., Cipolla, R.: Expressive visual text-to-speech using active appearance models. In: Proceedings of the CVPR (2013)
2. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Incremental face alignment in the wild. In: Proceedings of the CVPR (2014)
3. Benatan, M., Ng, K.: Cross-covariance-based features for speech classification in film audio. J. Vis. Lang. Comput. 31, 215–221 (2015)
4. Black, A., Taylor, P., Caley, R., Clark, R., Richmond, K., King, S., Strom, V., Zen, H.: The festival speech synthesis system (2001).
http://www.cstr.ed.ac.uk/projects/festival/
5. Black, A.W., Lenzo, K.A.: Building synthetic voices. Language Technologies Institute, Carnegie Mellon University and Cepstral LLC (2003)