1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2014)
2. Busso, C., et al.: Iemocap: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008)
3. Chen, W., Xing, X., Xu, X., Pang, J., Du, L.: Speechformer: a hierarchical efficient framework incorporating the characteristics of speech (2022)
4. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Taylor, J.: Emotion recognition in hci. Signal Process. Mag. IEEE (2001)
5. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)