Isolated single sound lip-reading using a frame-based camera and event-based camera

Author:

Kanamaru Tatsuya,Arakane Taiki,Saitoh Takeshi

Abstract

Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality.

Publisher

Frontiers Media SA

Subject

Artificial Intelligence

Reference23 articles.

1. Deep lip reading: a comparison of models and an online application,;Afouras;Interspeech 2018,2018

2. LipNet: end-to-end sentence-level lipreading;Assael;arXiv:1611.01599,2016

3. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling;Bai;arXiv preprint,2018

4. Lip reading sentences in the wild,;Chung;IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017

5. Lip reading in the wild,;Chung;Asian Conference on Computer Vision (ACCV),2016

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Recent Advances in Bio-Inspired Vision Sensor: A Review;Journal of Circuits, Systems and Computers;2024-07-10

2. KuchiNavi: lip-reading-based navigation app;Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023);2024-03-25

3. Faces in Event Streams (FES): An Annotated Face Dataset for Event Cameras;Sensors;2024-02-22

4. Can you read lips with a masked face?;2023 18th International Conference on Machine Vision and Applications (MVA);2023-07-23

5. Efficient DNN Model for Word Lip-Reading;Algorithms;2023-05-27

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3