Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques-Reference-Cited by-同舟云学术

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Published:2024-01-03 Issue:1 Volume:4 Page:
ISSN:2730-7239
Container-title:Discover Internet of Things
language:en
Short-container-title:Discov Internet Things

Author:

Gourisaria Mahendra Kumar,Agrawal Rakshit,Sahni Manoj,Singh Pradeep Kumar^ORCID

Abstract

AbstractIn the era of automated and digitalized information, advanced computer applications deal with a major part of the data that comprises audio-related information. Advancements in technology have ushered in a new era where cutting-edge devices can deliver comprehensive insights into audio content, leveraging sophisticated algorithms such such as Mel Frequency Cepstral Coefficients (MFCCs) and Short-Time Fourier Transform (STFT) to extract and provide pertinent information. Our study helps in not only efficient audio file management and audio file retrievals but also plays a vital role in security, the robotics industry, and investigations. Beyond its industrial applications, our model exhibits remarkable versatility in the corporate sector, particularly in tasks like siren sound detection and more. Embracing this capability holds the promise of catalyzing the development of advanced automated systems, paving the way for increased efficiency and safety across various corporate domains. The primary aim of our experiment is to focus on creating highly efficient audio classification models that can be seamlessly automated and deployed within the industrial sector, addressing critical needs for enhanced productivity and performance. Despite the dynamic nature of environmental sounds and the presence of noises, our presented audio classification model comes out to be efficient and accurate. The novelty of our research work reclines to compare two different audio datasets having similar characteristics and revolves around classifying the audio signals into several categories using various machine learning techniques and extracting MFCCs and STFTs features from the audio signals. We have also tested the results after and before the noise removal for analyzing the effect of the noise on the results including the precision, recall, specificity, and F1-score. Our experiment shows that the ANN model outperforms the other six audio models with the accuracy of 91.41% and 91.27% on respective datasets.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s43926-023-00049-y.pdf

Reference56 articles.

1. Chu S, Narayanan S, Kuo C-CJ. Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process. 2009;17:1142–58.

2. Ahmad I. “Welcome from Editor-in-Chief: discover Internet-of-Things editorial”, inaugural issue. Discov Internet Things. 2021;1:1.

3. E. Alexandre, L. Caudra, M. Rosa, and F. Lopez-Ferreras, “Feature selection for sound classification in hearing aids through restricted search driven by genetic algorithms,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2249–2256, Oct. 2007.L. Ballan, A. Bazzica, M. Bertini, A. D. Bimbo, G. Serra, “Deep networks for audio event classification in soccer videos,” In Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 474–477, 2009.

4. Vacher M, Serignat J-F, and Chaillot S. “Sound classification in a smart room environment: an approach using GMM and HMM methods,” In Proceedings of the IEEE Conference on Speech Technology and Human-Computer Dialogue, pp. 135–146, 2007.

5. . Ahmad I,. Swaminathan V, Aved A, &. Khalid S, “An overview of rate control techniques in HEVC and SHVC video encoding. Multimedia Tools and Applications”, vol. 81, no. 24, 2022.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Detection of Typical Transient Signals in Water by XGBoost Classifier Based on Shape Statistical Features: Application to the Call of Southern Right Whale;Journal of Marine Science and Engineering;2024-09-09

2. Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine Learning;Biomimetics;2024-06-27

3. The application of graphic language personalized emotion in graphic design;Heliyon;2024-05

4. Analyzing the packaging design evaluation based on image emotion perception computing;Heliyon;2024-05