1. [1] “Communication system for defending against attacks of media clones.” http://www2c.comm.eng.osaka-u.ac.jp/proj/mc/eindex.html, 2020.
2. [2] N. Babaguchi, I. Echizen, J. Yamagishi, N. Nitta, Y. Nakashima, K. Nakamura, K. Kono, F. Fang, S. Myojin, Z. Kuang, H.H. Nguyen, and N.D.T. Tieu, “Preventing fake information generation against media clone attacks,” IEICE Trans. Inf. & Syst., vol.E104-D, no.1, pp.2-11, Jan. 2021. 10.1587/transinf.2020MUI0001
3. [3] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: a generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
4. [4] A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, and T. Toda, “Speaker-dependent WaveNet vocoder,” Proc. Interspeech, pp.1118-1122, 2017. 10.21437/interspeech.2017-314
5. [5] S.Ö. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, and M. Shoeybi, “Deep voice: real-time neural text-to-speech,” Proc. International Conference on Machine Learning, pp.195-204, 2017.