Voice adaptation by color-encoded frame matching as a multi-objective optimization problem for future games-Reference-Cited by-同舟云学术

Voice adaptation by color-encoded frame matching as a multi-objective optimization problem for future games

Published:2022-01-04 Issue:2 Volume:8 Page:1539-1550
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Midtlyng Mads^ORCID,Sato Yuji,Hosobe Hiroshi

Abstract

AbstractVoice adaptation is an interactive speech processing technique that allows the speaker to transmit with a chosen target voice. We propose a novel method that is intended for dynamic scenarios, such as online video games, where the source speaker’s and target speaker’s data are nonaligned. This would yield massive improvements to immersion and experience by fully becoming a character, and address privacy concerns to protect against harassment by disguising the voice. With unaligned data, traditional methods, e.g., probabilistic models become inaccurate, while recent methods such as deep neural networks (DNN) require too substantial preparation work. Common methods require multiple subjects to be trained in parallel, which constraints practicality in productive environments. Our proposal trains a subject nonparallel into a voice profile used against any unknown source speaker. Prosodic data such as pitch, power and temporal structure are encoded into RGBA-colored frames used in a multi-objective optimization problem to adjust interrelated features based on color likeness. Finally, frames are smoothed and adjusted before output. The method was evaluated using Mean Opinion Score, ABX, MUSHRA, Single Ease Questions and performance benchmarks using two voice profiles of varying sizes and lastly discussion regarding game implementation. Results show improved adaptation quality, especially in a larger voice profile, and audience is positive about using such technology in future games.

Funder

Japan Society for the Promotion of Science

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-021-00604-6.pdf

Reference36 articles.

1. Eason Y, Stylianou (2009) Voice transformation: a survey. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taipei, pp 3585–3588

2. Erro D, Moreno A (2007) Weighted frequency warping for voice conversion. In: 8th Annual Conference of the International Speech Communication Association INTERSPEECH, Antwerp, pp 1965–1968

3. Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 1:285–288

4. Toda T, Saruwatari H, Shikano K (2001) Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. ICASSP, pp 841–844

5. Moulines E, Sagisaka Y (1995) Voice conversion: state of the art and perspectives. Speech Commun 16(2):125–126 (Special Issue)