Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features-Reference-Cited by-同舟云学术

Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

Published:2020-12-03 Issue:12 Volume:22 Page:1367
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Zhang Xuejie,Xu Yan,Abel Andrew K.^ORCID,Smith Leslie S.^ORCID,Watt Roger,Hussain Amir^ORCID,Gao Chengxiang

Abstract

Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/22/12/1367/pdf

Reference59 articles.

1. Hearing lips and seeing voices

2. Towards Lipreading Sentences with Active Appearance Models;Sterpu;arXiv,2018

3. Lipreading in School-Age Children: The Roles of Age, Hearing Status, and Cognitive Ability

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Lip2Speech: Lightweight Multi-Speaker Speech Reconstruction with Gabor Features;Applied Sciences;2024-01-17

2. Gabor-based Audiovisual Fusion for Mandarin Chinese Speech Recognition;2022 30th European Signal Processing Conference (EUSIPCO);2022-08-29

3. The Text-Dependent Chinese Speaker Recognition System Based on the Universal Individual Identification;2021 IEEE 9th International Conference on Information, Communication and Networks (ICICN);2021-11-25

4. Human-Centric AI: The Symbiosis of Human and Artificial Intelligence;Entropy;2021-03-11