Author:
Nandakishor Salam,Pati Debadatta
Publisher
Springer Nature Switzerland
Reference30 articles.
1. Pingping, W., et al.: A novel lip descriptor for audio-visual keyword spotting based on adaptive decision fusion. IEEE Trans. Multimedia 18(3), 326–338 (2016)
2. Nandakishor, S., Pati, D.: Analysis of lombard effect by using hybrid visual features for ASR. In: Pattern Recognition and Machine Intelligence (PReMI 2021) (2021)
3. Higuchi, T., Gupta, A., Dhir, C.: Multi-task learning with cross attention for keyword spotting. In: IEEE Automatic Speech Recognition and Understanding Workshop (2021)
4. Berg, A., Connor, M., Cruz, M.T.: Keyword transformer: a self-attention model for keyword spotting. In: Proceedings of INTERSPEECH (2021)
5. Li, Y., et al.: Audio-visual keyword transformer for unconstrained sentence-level keyword spotting. In: CAAI Transactions on Intelligence Technology (2023)