SEAC: Serbian Emotional Amateur Cellphone Speech Corpus-Reference-Cited by-同舟云学术

SEAC: Serbian Emotional Amateur Cellphone Speech Corpus

Published:2022-09-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Suzić Siniša¹,Nosek Tijana¹,Sečujski Milan¹,Popović Branislav¹,Krstanović Lidija¹,Vujović Mia¹,Simić Nikola¹,Janev Marko²,Jakovljević Nikša¹,Delić Vlado¹

Affiliation:

1. University of Novi Sad, Faculty of Technical Sciences

2. Serbian Academy of Sciences and Arts, Institute of Mathematics

Abstract

Abstract Emotional speech recognition and synthesis of expressive speech are highly dependable on the availability of emotional speech corpora. In this paper, we present the creation and verification of the Serbian Emotional Amateur Cellphone Speech Corpus (SEAC), which was released by the University of Novi Sad, Faculty of Technical Sciences in 2022, as the first amateur emotional speech corpus in Serbian language, recorded over cellphones. The corpus contains emotional speech elicited from 53 different speakers (24 male and 29 female) in 5 different emotional states (neutral, happiness, sadness, fear and anger), and its total duration amounts to approximately 8 hours of speech data. Initial objective evaluation of the corpus has confirmed high correlation between the behaviour of acoustic parameters corresponding to different emotional states in the newly recorded corpus and the existing Serbian emotional speech corpus recorded by 6 professional actors, which was used as a source for reference recordings. The corpus was further evaluated through listening tests concerned with human emotion recognition. Finally, we present the results of experiments concerning emotion recognition and speaker recognition by several conventional machine learning systems carried out on the corpus, as well as the results of a cross-lingual emotion recognition experiment involving a state-of-the-art machine learning system based on deep convolutional neural networks.

Publisher

Research Square Platform LLC

Reference39 articles.

1. Bashirpour, Geravanchizadeh, Bashirpour, M., & Geravanchizadeh, M. (2018). (2018). Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments, EURASIP Journal on Audio, Speech, and Music Processing, 9(2018), DOI: https://doi.org/10.1186/s13636-018-0133-9

2. Batliner, Batliner, A., Blomber, M., D’Arcy, S., Elenius, D., Giuliani, D., Gerosa, M., Hacker, C., Russell, M., Steidl, S., Wong, M., et al. (2005). (2005). The PF_STAR children’s speech corpus. In Proc. INTERSPEECH 2005, pp. 2761–2764

3. Burkhardt, Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., Weiss, B., et al. (2005). (2005). A database of German emotional speech. In Proc. INTERSPEECH 2005, Vol. 5, pp. 1517–1520, DOI: 10.1109/ICME.2008.4607572

4. Busso, Busso, C., Bulut, M., Lee, C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J., Lee, S., Narayanan, S., et al. (2008). (2008). IEMOCAP: Interactive emotional dyadic motion capture database, Journal of Language Resources and Evaluation, 42(4), 335–359, DOI: https://doi.org/10.1007/s10579-008-9076-6

5. Caldognetto, Caldognetto, E. M., Cosi, P., Drioli, C., Tisato, G., Cavicchio, F., et al. (2004). (2004). Modifications of phonetic labial targets in emotive speech: Effects of the co-production of speech and emotions. Speech Communication, 44 (1–4), 173–185. DOI: https://doi.org/10.1016/j.specom.2004.10.012