Abstract
AbstractThis paper proposes a multichannel environmental sound segmentation method. Environmental sound segmentation is an integrated method to achieve sound source localization, sound source separation and classification, simultaneously. When multiple microphones are available, spatial features can be used to improve the localization and separation accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound source localization and sound source separation methods using spatial features and classification using spectral features trained in the same neural network, may overfit to the relationship between the direction of arrival and the class of a sound, thereby reducing their reliability to deal with novel events. (b) Although permutation invariant training used in autonomous speech recognition could be extended, it is impractical for environmental sounds that include an unlimited number of sound sources. (c) Various features, such as complex values of short time Fourier transform and interchannel phase differences have been used as spatial features, but no study has compared them. This paper proposes a multichannel environmental sound segmentation method comprising two discrete blocks, a sound source localization and separation block and a sound source separation and classification block. By separating the blocks, overfitting to the relationship between the direction of arrival and the class is avoided. Simulation experiments using created datasets including 75-class environmental sounds showed the root mean squared error of the proposed method was lower than that of conventional methods.
Funder
Japan Society for the Promotion of Science
Publisher
Springer Science and Business Media LLC
Reference36 articles.
1. Nakadai K, Okuno HG, Kitano H (2002) Real-time sound source localization and separation for robot audition. In: Proceedings IEEE international conference on spoken language processing, pp 193–196
2. Nakadai K, Matsuura D, Okuno HG, Kitano H (2003) Applying scattering theory to robot audition system: Robust sound source localization and extraction. In: Proceedings IEEE/RSJ international conference on intelligent robots and systems, pp 1147–1152
3. Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimed Speech Signal Process 17(10):1733–1746
4. Gabriel D, Kojima R, Hoshiba K, Itoyama K, Nishida K, Nakadai K (2019) 2D sound source position estimation using microphone arrays and its application to a VR-based bird song analysis system. J Adv Robot 33(7-8):403–414
5. Nakadai K, Ince G, Nakamura K, Nakajima H (2012) Robot audition for dynamic environments. In: IEEE International conference on signal processing, communication and computing, pp 125–130
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献