Abstract
Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.
Funder
the National Natural Science Foundation of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献