Multichannel environmental sound segmentation-Reference-Cited by-同舟云学术

Multichannel environmental sound segmentation

Published:2021-03-30 Issue:11 Volume:51 Page:8245-8259
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Sudo Yui^ORCID,Itoyama Katsutoshi,Nishida Kenji,Nakadai Kazuhiro

Abstract

AbstractThis paper proposes a multichannel environmental sound segmentation method. Environmental sound segmentation is an integrated method to achieve sound source localization, sound source separation and classification, simultaneously. When multiple microphones are available, spatial features can be used to improve the localization and separation accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound source localization and sound source separation methods using spatial features and classification using spectral features trained in the same neural network, may overfit to the relationship between the direction of arrival and the class of a sound, thereby reducing their reliability to deal with novel events. (b) Although permutation invariant training used in autonomous speech recognition could be extended, it is impractical for environmental sounds that include an unlimited number of sound sources. (c) Various features, such as complex values of short time Fourier transform and interchannel phase differences have been used as spatial features, but no study has compared them. This paper proposes a multichannel environmental sound segmentation method comprising two discrete blocks, a sound source localization and separation block and a sound source separation and classification block. By separating the blocks, overfitting to the relationship between the direction of arrival and the class is avoided. Simulation experiments using created datasets including 75-class environmental sounds showed the root mean squared error of the proposed method was lower than that of conventional methods.

Funder

Japan Society for the Promotion of Science

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10489-021-02314-5.pdf

Reference36 articles.

1. Nakadai K, Okuno HG, Kitano H (2002) Real-time sound source localization and separation for robot audition. In: Proceedings IEEE international conference on spoken language processing, pp 193–196

2. Nakadai K, Matsuura D, Okuno HG, Kitano H (2003) Applying scattering theory to robot audition system: Robust sound source localization and extraction. In: Proceedings IEEE/RSJ international conference on intelligent robots and systems, pp 1147–1152

3. Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimed Speech Signal Process 17(10):1733–1746

4. Gabriel D, Kojima R, Hoshiba K, Itoyama K, Nishida K, Nakadai K (2019) 2D sound source position estimation using microphone arrays and its application to a VR-based bird song analysis system. J Adv Robot 33(7-8):403–414

5. Nakadai K, Ince G, Nakamura K, Nakajima H (2012) Robot audition for dynamic environments. In: IEEE International conference on signal processing, communication and computing, pp 125–130

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A hybrid offline-online method for sound event localization and detection;Applied Intelligence;2024-08-20

2. Online adaptation of fourier series-based acoustic transfer function model and its application to sound source localization and separation;Advanced Robotics;2024-07-15

3. Automatic Bird Sound Source Separation Based on Passive Acoustic Devices in Wild Environment;IEEE Internet of Things Journal;2024-05-01

4. Detection technology application based on spectral subtraction and vibro acoustic principle in the measurement of ship reliability level;Frontiers in Mechanical Engineering;2024-03-25

5. Online Adaptation of Fourier Series Based Acoustic Transfer Function Model to Improve Sound Source Localization and Separation;2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN);2023-08-28