Author:
Kanamaru Tatsuya,Arakane Taiki,Saitoh Takeshi
Abstract
Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality.
Reference23 articles.
1. Deep lip reading: a comparison of models and an online application,;Afouras;Interspeech 2018,2018
2. LipNet: end-to-end sentence-level lipreading;Assael;arXiv:1611.01599,2016
3. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling;Bai;arXiv preprint,2018
4. Lip reading sentences in the wild,;Chung;IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017
5. Lip reading in the wild,;Chung;Asian Conference on Computer Vision (ACCV),2016
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Recent Advances in Bio-Inspired Vision Sensor: A Review;Journal of Circuits, Systems and Computers;2024-07-10
2. KuchiNavi: lip-reading-based navigation app;Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023);2024-03-25
3. Faces in Event Streams (FES): An Annotated Face Dataset for Event Cameras;Sensors;2024-02-22
4. Can you read lips with a masked face?;2023 18th International Conference on Machine Vision and Applications (MVA);2023-07-23
5. Efficient DNN Model for Word Lip-Reading;Algorithms;2023-05-27