Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure-Reference-Cited by-同舟云学术

Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure

Published:2023-09-19 Issue: Volume: Page:
ISSN:0824-7935
Container-title:Computational Intelligence
language:en
Short-container-title:Computational Intelligence

Author:

Hanzlíček Zdeněk¹^ORCID,Matoušek Jindřich¹^ORCID,Vít Jakub¹^ORCID

Affiliation:

1. NTIS–New Technologies for the Information Society, Faculty of Applied Sciences University of West Bohemia Pilsen Czech Republic

Abstract

AbstractThis article describes experiments on speech segmentation using long short‐term memory recurrent neural networks. The main part of the paper deals with multi‐lingual and cross‐lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi‐lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well‐tuned hidden Markov model‐based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text‐to‐speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.

Funder

Grantová Agentura České Republiky

Publisher

Wiley

Subject

Artificial Intelligence,Computational Mathematics

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12602

Reference94 articles.

1. Automatic labelling of continuous speech with a given phonetic transcription using dynamic programming algorithms

2. Automatic segmentation and labeling of speech

3. Automatic segmentation and labeling of speech based on Hidden Markov Models

4. Comparative study of Automatic Phone Segmentation methods for TTS

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation;Phonetica;2024-09-05

2. Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis;Lecture Notes in Computer Science;2024

3. Sentences vs Phrases in Neural Speech Synthesis;Lecture Notes in Computer Science;2024

4. Data Alignment and Duration Modelling in VITS;Lecture Notes in Computer Science;2024