Abstract
Sound event detection, speech emotion classification, music classification, acoustic scene classification, audio tagging and several other audio pattern recognition applications are largely dependent on the growing machine learning technology. The audio pattern recognition issues are also addressed by neural networks in recent days. The existing systems operate within limited durations on specific datasets. Pretrained systems with large datasets in natural language processing and computer vision applications over the recent years perform well in several tasks. However, audio pattern recognition research with large-scale datasets is limited in the current scenario. In this paper, a large-scale audio dataset is used for training a pre-trained audio neural network. Several audio related tasks are performed by transferring this audio neural network. Several convolution neural networks are used for modeling the proposed audio neural network. The computational complexity and performance of this system are analyzed. The waveform and leg-mel spectrogram are used as input features in this architecture. During audio tagging, the proposed system outperforms the existing systems with a mean average of 0.45. The performance of the proposed model is demonstrated by applying the audio neural network to five specific audio pattern recognition tasks.
Publisher
Inventive Research Organization
Reference25 articles.
1. [1] Verbitskiy, S., Berikov, V., & Vyshegorodtsev, V. (2021). Eranns: Efficient residual audio neural networks for audio pattern recognition. arXiv preprint arXiv:2106.01621.
2. [2] Adam, E. E. B. (2020). Deep Learning based NLP Techniques In Text to Speech Synthesis for Communication Recognition. Journal of Soft Computing Paradigm (JSCP), 2(04), 209-215.
3. [3] Xu, K., Zhu, B., Kong, Q., Mi, H., Ding, B., Wang, D., & Wang, H. (2019). General audio tagging with ensembling convolutional neural networks and statistical features. The Journal of the Acoustical Society of America, 145(6), EL521-EL527.
4. [4] Rodrigo, W. U. D., H. U. W. Ratnayake, and I. A. Premaratne. "Identification of Music Instruments from a Music Audio File." In Proceedings of International Conference on Sustainable Expert Systems: ICSES 2020, vol. 176, p. 335. Springer Nature, 2021.
5. [5] Dhaya, R. "Efficient Two Stage Identification for Face mask detection using Multiclass Deep Learning Approach." Journal of Ubiquitous Computing and Communication Technologies 3, no. 2 (2021): 107-121.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献