Author:
Wang Jianrong,Zhang Yichao,Liu Wei,Chen Yu
Abstract
Abstract
The lip region provides the most direct visual information in the process of multi-sensory speech perception, which is applied to speech recognition and lip reading. In this paper, we extract eight lip features in articulating the basic vowels [a], [e], [i], [u], [ü] in standard Chinese, and analyze the efficiency in distinguishing the five vowels combined with articulatory phonetics. We use Dense Convolutional Network (DenseNet) to process two-dimensional lip images and fuse the lip features to identify the Chinese with consonants. The results show that the application of lip shape features in Chinese vowel recognition and Chinese consonant lip reading is consistent. Two-dimensional lip images can effectively improve the recognition rate by fusing lip features in lip reading.
Subject
General Physics and Astronomy