1. [1] Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuawhara, “A large-scale Japanese speech database,” ICSLP90, Kobe, Japan, pp.1089-1092, Nov. 1990.
2. [2] Y. Wang, R.J.S.-Ryan, D. Stanton, Y. Wu, R.J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and R.A. Saurous, “Tacotron: Towards end-to-end speech synthesis,” Proc. INTERSPEECH, Stockholm, Sweden, pp.4006-4010, Aug. 2017. 10.21437/interspeech.2017-1452
3. [3] H. Zen, K. Tokuda, and A. Black, “Statistical parametric speech synthesis,” Speech Communication, vol.51, no.11, pp.1039-1064, 2009. 10.1016/j.specom.2009.04.004
4. [4] K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, “Mel-generalized cepstral analysis-a unified approach to speech spectral estimation,” Proc. ICSLP, Yokohama, Japan, pp.410-415, Sept. 1994.
5. [5] P. Zolfaghari and T. Robinson, “Formant analysis using mixtures of gaussians,” Proc. ICSLP, vol.2, pp.1229-1232, 1996. 10.1109/icslp.1996.607830