An Auditory Saliency Pooling-Based LSTM Model for Speech Intelligibility Classification-Reference-Cited by-同舟云学术

An Auditory Saliency Pooling-Based LSTM Model for Speech Intelligibility Classification

Published:2021-09-18 Issue:9 Volume:13 Page:1728
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Gallardo-Antolín Ascensión^ORCID,Montero Juan M.^ORCID

Abstract

Speech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.

Funder

Spanish Ministry of 419 Economy, Industry and Competitiveness

Universidad Carlos III de Madrid

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2073-8994/13/9/1728/pdf

Reference51 articles.

1. Dysarthric speech: A comparison of computerized speech recognition and listener intelligibility;Doyle;J. Rehabil. Res. Dev.,1997

2. Intelligibility as a linear combination of dimensions in dysarthric speech