JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research-Reference-Cited by-同舟云学术

JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

Published:2020-09-01 Issue:5 Volume:41 Page:761-768
ISSN:1346-3969
Container-title:Acoustical Science and Technology
language:en
Short-container-title:Acoust. Sci. & Tech.

Author:

Takamichi Shinnosuke¹,Sonobe Ryosuke¹,Mitsui Kentaro¹,Saito Yuki¹,Koriyama Tomoki¹,Tanji Naoko¹,Saruwatari Hiroshi¹

Affiliation:

1. Graduate School of Information Science and Technology, The University of Tokyo

Publisher

Acoustical Society of Japan

Subject

Acoustics and Ultrasonics

Link

https://www.jstage.jst.go.jp/article/ast/41/5/41_E1950/_pdf

Reference50 articles.

1. 1) G. Hinton, L. Deng, D. Yu, G. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath and B. Kingsbury, ``Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,'' Signal Process. Mag., 29, 82-97 (2012).

2. 2) A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior and K. Kavukcuoglu, ``WaveNet: A generative model for raw audio,'' arXiv 1609.03499 (2016).

3. 3) S. Takamichi, K. Tomoki and H. Saruwatari, ``Sampling-based speech parameter generation using moment-matching networks,'' Proc. Interspeech, Stockholm, Sweden, Aug., pp. 3961-3965 (2017).

4. 4) Y. Saito, S. Takamichi and H. Saruwatari, ``Statistical parametric speech synthesis incorporating generative adversarial networks,'' IEEE/ACM Trans. Audio Speech Lang. Process., 26, 84-96 (2018).

5. 5) M. Abe, Y. Sagisaka, T. Umeda and H. Kuwabara, ``Speech database user manual,'' ATR Tech. Rep., no. TR-I-0166M (1990).

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DeepMine-Multi-TTS: A Persian Speech Corpus for Multi-Speaker Text-to-Speech;2024-08-19

2. Real-Time Speech Extraction Using Spatially Regularized Independent Low-Rank Matrix Analysis and Rank-Constrained Spatial Covariance Matrix Estimation;2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW);2024-04-14

3. Do Learned Speech Symbols Follow Zipf’s Law?;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

4. Fast Neural Speech Waveform Generative Models With Fully-Connected Layer-Based Upsampling;IEEE Access;2024

5. COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control;2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2023-12-16