Abstract
In automatic speech recognition systems, the training data used for system development and the data actually obtained from the users of the system sometimes significantly differ in practice. However, other, more similar data may be available. Transfer learning can help to exploit such similar data for training in order to boost the automatic speech recognizer's performance for a certain domain. This paper presents a few applications of transfer learning in the context of speech recognition, specifically for the Serbian language. Several methods are proposed, with the goal of optimizing system performance on a specific part of the existing speech database for Serbian, or in a noisy environment. The experimental results evaluated on a test set from the desired domain show significant improvement in both word error rate and character error rate.
Publisher
Centre for Evaluation in Education and Science (CEON/CEES)
Subject
Computer Networks and Communications,Media Technology,Radiation,Signal Processing,Software
Reference16 articles.
1. E. Pakoci, B. Popović and D. Pekar, "Using morphological data in language modeling for Serbian large vocabulary speech recognition," in Computational Intelligence and Neuroscience, Special Issue on Advanced Signal Processing and Adaptive Learning Methods, vol. 2019, 8 pages, 2019;
2. B. Popović, E. Pakoci and D. Pekar, "A comparison of language model training techniques in a continuous speech recognition system for Serbian," in Proceedings of the 20th International Conference on Speech and Computer (SPECOM) -Lecture Notes in Artificial Intelligence, vol. 11096, pp. 522-531, Leipzig, Germany, September 2018;
3. E. Pakoci, B. Popović and D. Pekar, "Improvements in Serbian speech recognition using sequence-trained deep neural networks," in SPIIRAS Proceedings, vol. 3, no. 58, pp. 53-76, 2018;
4. V. Peddinti, D. Povey and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," in Proc. of the 16th Annual Conf. of the International Speech Communication Association (INTERSPEECH), pp. 3214-3218, Dresden, Germany, September 2015;
5. N. Dehak, P. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, "Frontend factor analysis for speaker verification," in IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788-798, 2011;