Abstract
Recognizing emotions and human speech has always been an exciting challenge for scientists. In our work the parameterization of the vector is obtained and realized from the sentence divided into the containing emotional-informational part and the informational part is effectively applied. The expressiveness of human speech is improved by the emotion it conveys. There are several characteristics and features of speech that differentiate it among utterances, i.e. various prosodic features like pitch, timbre, loudness and vocal tone which categorize speech into several emotions. They were supplemented by us with a new classification feature of speech, which consists in dividing a sentence into an emotionally loaded part of the sentence and a part that carries only informational load. Therefore, the sample speech is changed when it is subjected to various emotional environments. As the identification of the speaker’s emotional states can be done based on the Mel scale, MFCC is one such variant to study the emotional aspects of a speaker’s utterances. In this work, we implement a model to identify several emotional states from MFCC for two datasets, classify emotions for them on the basis of MFCC features and give the correspondent comparison of them. Overall, this work implements the classification model based on dataset minimization that is done by taking the mean of features for the improvement of the classification accuracy rate in different machine learning algorithms. In addition to the static analysis of the author's tonal portrait, which is used in particular in MFFC, we propose a new method for the dynamic analysis of the phrase in processing and studying as a new linguistic-emotional entity pronounced by the same author. Due to the ranking by the importance of the MEL scale features, we are able to parameterize the vectors coordinates be processed by the parametrized KNN method. Language recognition is a multi-level task of pattern recognition. Here acoustic signals are analyzed and structured in a hierarchy of structural elements, words, phrases and sentences. Each level of such a hierarchy may provide some temporal constants: possible word sequences or known types of pronunciation that reduce the number of recognition errors at a lower level. An analysis of voice and speech dynamics is appropriate for improving the quality of human perception and the formation of human speech by a machine and is within the capabilities of artificial intelligence. Emotion results can be widely applied in e-learning platforms, vehicle on-board systems, medicine, etc
Publisher
National Academy of Sciences of Ukraine (Co. LTD Ukrinformnauka)
Reference71 articles.
1. 1. S. G., K. Koolagudi, and K. S. Rao, "Emotion recognition from speech: A review", in International Journal of Speech Technology, 2012, https://doi.org/10.1007/s10772-011-9125-1.
2. 2. C. Marechal et al., "Survey on AI-based multimodal methods for emotion detection", in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019.
3. 3. K. S. Rao, S. G. Koolagudi, and R. R. Vempada, "Emotion recognition from speech using global and local prosodic features", International Journal of Speech Technology, 2013. DOIhttps://doi.org/10.1007/s10772-012-9172-2
4. 4. S. G. Koolagudi, A. Barthwal, S. Devliyal, and K. Sreenivasa Rao, "Real life emotion classification from speech using gaussian mixture models", in Communications in Computer and Information Science, 2012. DOIhttps://doi.org/10.1007/978-3-642-32129-0_28
5. 5. S. Latif, R. Rana, S. Younis, J. Qadir, and J. Epps, "Transfer learning for improving speech emotion classification accuracy", Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2018-Septe, no. January, pp. 257-261, 2018.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献