SEAC: Serbian Emotional Amateur Cellphone Speech Corpus
Author:
Suzić Siniša1, Nosek Tijana1, Sečujski Milan1, Popović Branislav1, Krstanović Lidija1, Vujović Mia1, Simić Nikola1, Janev Marko2, Jakovljević Nikša1, Delić Vlado1
Affiliation:
1. University of Novi Sad, Faculty of Technical Sciences 2. Serbian Academy of Sciences and Arts, Institute of Mathematics
Abstract
Abstract
Emotional speech recognition and synthesis of expressive speech are highly dependable on the availability of emotional speech corpora. In this paper, we present the creation and verification of the Serbian Emotional Amateur Cellphone Speech Corpus (SEAC), which was released by the University of Novi Sad, Faculty of Technical Sciences in 2022, as the first amateur emotional speech corpus in Serbian language, recorded over cellphones. The corpus contains emotional speech elicited from 53 different speakers (24 male and 29 female) in 5 different emotional states (neutral, happiness, sadness, fear and anger), and its total duration amounts to approximately 8 hours of speech data. Initial objective evaluation of the corpus has confirmed high correlation between the behaviour of acoustic parameters corresponding to different emotional states in the newly recorded corpus and the existing Serbian emotional speech corpus recorded by 6 professional actors, which was used as a source for reference recordings. The corpus was further evaluated through listening tests concerned with human emotion recognition. Finally, we present the results of experiments concerning emotion recognition and speaker recognition by several conventional machine learning systems carried out on the corpus, as well as the results of a cross-lingual emotion recognition experiment involving a state-of-the-art machine learning system based on deep convolutional neural networks.
Publisher
Research Square Platform LLC
Reference39 articles.
1. Bashirpour, Geravanchizadeh, Bashirpour, M., & Geravanchizadeh, M. (2018). (2018). Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments, EURASIP Journal on Audio, Speech, and Music Processing, 9(2018), DOI: https://doi.org/10.1186/s13636-018-0133-9 2. Batliner, Batliner, A., Blomber, M., D’Arcy, S., Elenius, D., Giuliani, D., Gerosa, M., Hacker, C., Russell, M., Steidl, S., Wong, M., et al. (2005). (2005). The PF_STAR children’s speech corpus. In Proc. INTERSPEECH 2005, pp. 2761–2764 3. Burkhardt, Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., Weiss, B., et al. (2005). (2005). A database of German emotional speech. In Proc. INTERSPEECH 2005, Vol. 5, pp. 1517–1520, DOI: 10.1109/ICME.2008.4607572 4. Busso, Busso, C., Bulut, M., Lee, C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J., Lee, S., Narayanan, S., et al. (2008). (2008). IEMOCAP: Interactive emotional dyadic motion capture database, Journal of Language Resources and Evaluation, 42(4), 335–359, DOI: https://doi.org/10.1007/s10579-008-9076-6 5. Caldognetto, Caldognetto, E. M., Cosi, P., Drioli, C., Tisato, G., Cavicchio, F., et al. (2004). (2004). Modifications of phonetic labial targets in emotive speech: Effects of the co-production of speech and emotions. Speech Communication, 44 (1–4), 173–185. DOI: https://doi.org/10.1016/j.specom.2004.10.012
|
|