Subject-Agnostic Transformer-Based Neural Speech Decoding from Surface and Depth Electrode Signals-Reference-Cited by-同舟云学术

Subject-Agnostic Transformer-Based Neural Speech Decoding from Surface and Depth Electrode Signals

Published:2024-03-14 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chen Junbo,Chen Xupeng,Wang Ran,Le Chenqian,Khalilian-Gourtani Amirhossein,Jensen Erika,Dugan Patricia,Doyle Werner,Devinsky Orrin,Friedman Daniel,Flinker Adeen,Wang Yao

Abstract

AbstractObjectiveThis study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training.ApproachWe propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes, by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train both subject-specific models using data from a single participant as well as multi-patient models exploiting data from multiple participants.Main ResultsThe subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation.SignificanceThe proposed SwinTW decoder enables future speech neuropros-theses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests the exciting possibility of developing speech neuropros-theses for people with speech disability without relying on their own neural data for training, which is not always feasible.

Publisher

Cold Spring Harbor Laboratory

Reference44 articles.

1. Speech synthesis from ecog using densely connected 3d convolutional neural networks;Journal of neural engineering,2019

2. M. Angrick , M. Ottenhoff , L. Diener , D. Ivucic , G. Ivucic , S. Goulis , A. J. Colon , L. Wagner , D. J. Krusienski , P. L. Kubben , et al. Towards closed-loop speech synthesis from stereotactic eeg: a unit selection approach. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1296–1300. IEEE, 2022.

3. Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity;commun biol,2021

4. Speech synthesis from neural decoding of spoken sentences

5. Layer normalization;arXiv preprint,2016

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Speech Synthesis from Electrocorticogram During Imagined Speech Using a Transformer-Based Decoder and Pretrained Vocoder;2024-08-22