Abstract
Audio recognition can be used in smart cities for security, surveillance, manufacturing, autonomous vehicles, and noise mitigation, just to name a few. However, urban sounds are everyday audio events that occur daily, presenting unstructured characteristics containing different genres of noise and sounds unrelated to the sound event under study, making it a challenging problem. Therefore, the main objective of this literature review is to summarize the most recent works on this subject to understand the current approaches and identify their limitations. Based on the reviewed articles, it can be realized that Deep Learning (DL) architectures, attention mechanisms, data augmentation techniques, and pretraining are the most crucial factors to consider while creating an efficient sound classification model. The best-found results were obtained by Mushtaq and Su, in 2020, using a DenseNet-161 with pretrained weights from ImageNet, and NA-1 and NA-2 as augmentation techniques, which were of 97.98%, 98.52%, and 99.22% for UrbanSound8K, ESC-50, and ESC-10 datasets, respectively. Nonetheless, the use of these models in real-world scenarios has not been properly addressed, so their effectiveness is still questionable in such situations.
Funder
European Regional Development Fund
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献