1. M. Baldonado, C.-C.K. Chang, L. Gravano, A. Paepcke, The Stanford Digital Library Metadata Architecture. Int. J. Digit. Libr. 1, 108–121 (1997)
2. H. Tachibana, K. Uenoyama, S. Aihara, Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention in IEEE april (2018). arXiv:1710.08969
3. S.O. Arik, J. Chen, K. Peng, W. Ping, Y. Zhou, Neural voice cloning with a few sample, in IEEE ICASSP (2016). arXiv:1802.06006
4. G. Ruggiero, E. Zovato, L. Di Caro, Vincent Pollety. Voice cloning: a multi-speaker text-to-speech synthesis approach based on transfer learning, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) arXiv:2102.05630
5. J.S. Chung, A. Jamaludin, A. Zisserman, You said that? arXiv preprint (2017). arXiv:1705.02966