A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis-Reference-Cited by-同舟云学术

A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis

Published:2023-09-17 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chen Xupeng^ORCID,Wang Ran,Khalilian-Gourtani Amirhossein,Yu Leyao,Dugan Patricia,Friedman Daniel,Doyle Werner,Devinsky Orrin,Wang Yao,Flinker Adeen

Abstract

AbstractDecoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech function in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality, and the limited publicly available source code. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We develop a companion audio-to-audio auto-encoder consisting of a Speech Encoder and the same Speech Synthesizer to generate reference speech parameters to facilitate the ECoG Decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Among three neural network architectures for the ECoG Decoder, the 3D ResNet model has the best decoding performance (PCC=0.804) in predicting the original speech spectrogram, closely followed by the SWIN model (PCC=0.796). Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. We successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with speech deficits resulting from left hemisphere damage. Further, we use an occlusion analysis to identify cortical regions contributing to speech decoding across our models. Finally, we provide open-source code for our two-stage training pipeline along with associated preprocessing and visualization tools to enable reproducible research and drive research across the speech science and prostheses communities.

Publisher

Cold Spring Harbor Laboratory

Reference51 articles.

1. Biosignal-based spoken communication: A survey. IEEE/ACM Transactions on Audio;Speech, and Language Processing,2017

2. Miller, K. J. , Hermes, D. & Staff, N. P . The current state of electrocorticography-based brain–computer interfaces. Neurosurgical focus 49 (1), E2 (2020) .

3. Brain-computer interface: applications to speech decoding and synthesis to augment communication;Neurotherapeutics,2022

4. Efficient inter-species conjugative transfer of a CRISPR nuclease for targeted bacterial killing

5. Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Subject-Agnostic Transformer-Based Neural Speech Decoding from Surface and Depth Electrode Signals;2024-03-14