Speaker verification system based on articulatory information from ultrasound recordings
Author:
Sepulveda Sepulveda Franklin AlexanderORCID, Porras-Plata DagobertoORCID, Sarria-Paja MiltonORCID
Abstract
Current state-of-the-art speaker verification (SV) systems are known to be strongly affected by unexpected variability presented during testing, such as environmental noise or changes in vocal effort. In this work, we analyze and evaluate articulatory information of the tongue's movement as a means to improve the performance of speaker verification systems. We use a Spanish database, where besides the speech signals, we also include articulatory information that was acquired with an ultrasound system. Two groups of features are proposed to represent the articulatory information, and the obtained performance is compared to an SV system trained only with acoustic information. Our results show that the proposed features contain highly discriminative information, and they are related to speaker identity; furthermore, these features can be used to complement and improve existing systems by combining such information with cepstral coefficients at the feature level.
Publisher
Universidad Nacional de Colombia
Subject
General Engineering
Reference43 articles.
1. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V. and Wellekens, C., Automatic speech recognition and speech variability: a review. Speech Communication. 49(10), pp. 763-786, 2007. DOI: 10.1016/j.specom.2007.02.006 2. O’Shaughnessy, D., Speech communications: human and machine, 2nd Ed., Wiley-IEEE Press, New York, USA, 1999, 548 P. 3. Kitapci, K. and Galbrun, L., Perceptual analysis of the speech intelligibility and soundscape of multilingual environments. Applied Acoustics, 151, pp. 124-136, 2019. DOI: 10.1016/j.apacoust.2019.03.001. 4. Rix, A.W., Beerends, J.G., Hollier, M.P. and Hekstra, A.P., Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in: International Conference on Acoustics, Speech, and Signal Processing, Proceedings. IEEE, Salt Lake City, USA, 2001, pp. 749-752. 5. Kinnunen, T. and Li, H., An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), pp. 12-40, 2010. DOI: 10.1016/j.specom.2009.08.009.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|