Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model-Reference-Cited by-同舟云学术

Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model

Published:2021-08-02 Issue: Volume:12 Page:
ISSN:1664-042X
Container-title:Frontiers in Physiology
language:
Short-container-title:Front. Physiol.

Author:

Kuruvila Ivine,Muncke Jan,Fischer Eghart,Hoppe Ulrich

Abstract

Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.

Funder

Johannes und Frieda Marohn-Stiftung

Publisher

Frontiers Media SA

Subject

Physiology (medical),Physiology

Reference60 articles.

1. Human cortical responses to the speech envelope;Aiken;Ear Hear,2008

2. Auditory-inspired speech envelope extraction methods for improved eeg-based auditory attention detection in a cocktail party scenario;Biesmans;IEEE Trans. Neural Syst. Rehabil. Eng,2017

3. Semantic context enhances the early auditory encoding of natural speech;Broderick;J. Neurosci,2019

4. Some experiments on the recognition of speech, with one and with two ears;Cherry;J. Acoust. Soc. Am,1953

5. Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods;Ciccarelli;Sci. Rep,2019

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DGSD: Dynamical graph self-distillation for EEG-based auditory spatial attention detection;Neural Networks;2024-11

2. Subject-independent auditory spatial attention detection based on brain topology modeling and feature distribution alignment;Hearing Research;2024-11

3. Brain connectivity and time-frequency fusion-based auditory spatial attention detection;Neuroscience;2024-09

4. Attention-guided graph structure learning network for EEG-enabled auditory attention detection;Journal of Neural Engineering;2024-05-30

5. Investigating Self-Supervised Deep Representations for EEG-Based Auditory Attention Decoding;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14