1. Aaron A, Bakis R, Eide EM, Hamza WM (2014) Systems and methods for text-to-speech synthesis using spoken example, November 11 2014. US Patent 8,886,538
2. Arik SO, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, Miller J, Ng A, Raiman J, et al. (2017) Deep voice: real-time neural text-to-speech. In: Proceedings of the 34th international conference on machine learning (ICML), vol 70, pp 195–204
3. Arik SO, Jun H, Diamos G (2018) Fast spectrogram inversion using multi-head convolutional neural networks. IEEE Signal Process Lett 26(1):94–98
4. Bracewell RN, Bracewell RN (1986) The Fourier transform and its applications, vol 31999. McGraw-Hill, New York
5. Braunschweiler N, Gales MJF, Buchholz S (2010) Lightly supervised recognition for automatic alignment of large coherent speech recordings. In: INTERSPEECH