Author:
Ardon Kotey ,Allan Almeida ,Nihal Gupta ,Dr. Vinaya Sawant
Abstract
Birds are meaningful to a wide audience including the public. They live in almost every type of environment and in almost every niche (place or role) within those environments. The monitoring of species diversity and migration is important for almost all conservation efforts. The analysis of long-term audio data is vital to support those efforts but relies on complex algorithms that need to adapt to changing environmental conditions. Convolutional neural networks (CNNs) are powerful toolkits of machine learning that have proven efficient in the field of image processing and sound recognition. In this paper, a CNN system classifying bird sounds is presented and tested through different configurations and hyperparameters. The MobileNet pre-trained CNN model is finetuned using a dataset acquired from the Xeno-canto bird song sharing portal, which provides a large collection of labeled and categorized recordings. Spectrograms generated from the downloaded data represent the input of the neural network. The attached experiments compare various configurations including the number of classes (bird species) and the color scheme of the spectrograms. Results suggest that choosing a color map in line with the images the network has been pre-trained with provides a measurable advantage. The presented system is viable only for a low number of classes.
Reference16 articles.
1. J. Salamon and J. P. Bello, “Deep convolutional neural networks and data augmentation for environmental sound classification,” IEEE Signal Processing Letters, vol. 24, pp. 279–283, 2017.
2. V. Bisot, R. Serizel, S. Essid, and G. Richard, “Leveraging deep neural networks with nonnegative representations for improved environmental sound classification,” IEEE International Workshop on Machine Learning for Signal Processing MLS, 2017.
3. D. Gupta. (2017) Transfer learning and the art of using pre-trained models in deep learning. [Online].Available:https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tuning-a-pre-trained-model/
4. J. Allen, “Short term spectral analysis, synthesis, and modification by discrete fourier transform,” IEEE Transactions on Acoustics, Speech,and Signal Processing, vol. 25, no. 3, pp. 235–238, Jun 1977.
5. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” CoRR, vol. abs/1704.04861,2017.