Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit-Reference-Cited by-同舟云学术

Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit

Published:2022-02-19 Issue:1 Volume:2022 Page:
ISSN:1687-6180
Container-title:EURASIP Journal on Advances in Signal Processing
language:en
Short-container-title:EURASIP J. Adv. Signal Process.

Author:

Batista Cassio^ORCID,Dias Ana Larissa,Neto Nelson

Abstract

AbstractPhonetic analysis of speech, in general, requires the alignment of audio samples to its phonetic transcription. This could be done manually for a couple of files, but as the corpus grows large, it becomes infeasibly time-consuming. This paper describes the evolution process toward creating free resources for phonetic alignment in Brazilian Portuguese (BP) using Kaldi, a toolkit that achieves state of the art for open-source speech recognition, within a toolkit we call UFPAlign. The contributions of this work are then twofold: developing resources to perform forced alignment in BP, including the release of scripts to train acoustic models via Kaldi, as well as the resources themselves under open licenses; and bringing forth a comparison to other two phonetic aligners that provide resources for BP, namely EasyAlign and Montreal Forced Aligner (MFA), the latter being also Kaldi-based. Evaluation took place in terms of phone boundary and intersection over union metrics over a dataset of 385 hand-aligned utterances, and results show that Kaldi-based aligners perform better overall, and that UFPAlign models are more accurate than MFA’s. Furthermore, complex deep-learning-based approaches still do not improve performance compared to simpler models.

Funder

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Conselho Nacional de Desenvolvimento Científico e Tecnológico

Publisher

Springer Science and Business Media LLC

Subject

General Medicine

Link

https://link.springer.com/content/pdf/10.1186/s13634-022-00844-9.pdf

Reference75 articles.

1. J.-P. Goldman, Easyalign: an automatic phonetic alignment tool under praat, in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 3233–3236 (2011)

2. M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, M. Sonderegger, Montreal forced aligner: trainable text-speech alignment using kaldi, in Proceedings of Interspeech, pp. 498–502 (2017). https://doi.org/10.21437/Interspeech.2017-1386