Abstract
Abstract
Objective. Auditory attention decoding (AAD) determines which speaker the listener is focusing on by analyzing his/her EEG. Convolutional neural network (CNN) was adopted to extract spectro-spatial-feature (SSF) from short-time-interval of EEG to detect auditory spatial attention without stimuli. However, the following factors are not considered in SSF-CNN scheme. (a) Single-band frequency analysis cannot represent the EEG pattern precisely. (b) The power cannot represent the EEG feature related to the dynamic patterns of the attended auditory stimulus. (c) The temporal feature of EEG representing the relationship between EEG and attended stimulus is not extracted. To solve these problems, SSF-CNN scheme was modified. Approach. (a) Multiple-frequency bands, but not a single alpha frequency band, of EEG, were analyzed to represent the EEG pattern more precisely. (b) Differential entropy, but not power, was extracted from each frequency band to represent the disorder degree of EEG, which was related to the dynamic patterns of the attended auditory stimulus. (c) CNN and convolutional-long-short-term-memory (ConvLSTM) were combined to extract spectro-spatial-temporal features from the 3D descriptor sequence constructed based on the topographical activity maps of multiple-frequency bands. Main results. Experimental results on KUL, DTU, and PKU with 0.1 s, 1 s, 2 s, and 5 s decision windows demonstrated that: (a) The proposed model outperformed SSF-CNN and state-of-the-art AAD models. Specifically, when the auditory stimulus was unavailable, AAD accuracy could be enhanced by at least
3.25
%
,
3.96
%
and
5.08
%
on KUL, DTU, and PKU, respectively, compared with the baselines. And, on KUL, the longer decision window corresponded to lower enhancement, while on both DTU and PKU, the longer decision window corresponded to higher enhancement, except for two cases when decision window length was 2 s on PKU or 5 s on DTU. (b) Each modification contributed to the performance enhancement. Significance. DE feature, multi-band frequency analysis, and ConvLSTM-based temporal analysis help to enhance AAD accuracy.
Funder
National Natural Science Foundation of China
Subject
Cellular and Molecular Neuroscience,Biomedical Engineering
Reference60 articles.
1. Some experiments on the recognition of speech, with one and with two ears;Cherry;J. Acoust. Soc. Am.,1953
2. Deep learning reinvents the hearing aid;Wang;IEEE Spectr.,2017
3. Single channel speech separation with constrained utterance level permutation invariant training using grid lstm;Xu,2018
4. Mpd-al: an efficient membrane potential driven aggregate-label learning algorithm for spiking neurons;Zhang,2019
5. An efficient threshold-driven aggregate-label learning algorithm for multimodal information processing;Zhang;IEEE J. Sel. Top. Signal Process.,2020
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献