Noise-Robust Speech Recognition Through Auditory Feature Detection and Spike Sequence Decoding-Reference-Cited by-同舟云学术

Noise-Robust Speech Recognition Through Auditory Feature Detection and Spike Sequence Decoding

Published:2014-03 Issue:3 Volume:26 Page:523-556
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Schafer Phillip B.¹,Jin Dezhe Z.¹

Affiliation:

1. Department of Physics and Center for Neural Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A.

Abstract

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences—one using a hidden Markov model–based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00557

Reference98 articles.

1. The Spectro-Temporal Receptive Field

2. Combination of hidden Markov models with dynamic time warping for speech recognition

3. The PASCAL CHiME speech separation and recognition challenge

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Algorithmic governance and AI: balancing innovation and oversight in Indonesian policy analyst;AI & SOCIETY;2024-07-01

2. Coal Gangue Recognition in the Strong Background Noise Using Two-Level Auditory Feature Fusion with Attention Mechanism;2024

3. Neuromorphic acoustic sensing using an adaptive microelectromechanical cochlea with integrated feedback;Nature Electronics;2023-05-04

4. Machine learning-based approach: Global trends, research directions, and regulatory standpoints;Data Science and Management;2021-12

5. A spiking network that learns to extract spike signatures from speech signals;Neurocomputing;2017-05