1. Arik SO, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y et al (2017) Deep voice: real-time neural text-to-speech. ArXiv, Retrieved from https://arxiv.org/abs/1702.07825v2
2. Bennett CL (2005) Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005. In: 9th European conference on speech communication and technology, pp 105–108
3. Black AW (2003) Unit selection and emotional speech. In: EUROSPEECH 2003—8th European conference on speech communication and technology, vol 3, pp 1649–1652
4. Black A, Campbell N (1996) Optimising selection of units from speech databases for concatenative synthesis. International Speech Communication Association, 1
5. Christensen H, Cunningham SP, Fox C, Green PD, Hain T (2012) A comparative study of adaptive, automatic recognition of disordered speech. Paper presented at the INTERSPEECH 2012, pp 1776–1779