A comprehensive multimodal dataset for contactless lip reading and acoustic analysis-Reference-Cited by-同舟云学术

A comprehensive multimodal dataset for contactless lip reading and acoustic analysis

Published:2023-12-13 Issue:1 Volume:10 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Ge Yao^ORCID,Tang Chong,Li Haobo,Chen Zikang,Wang Jingyan,Li Wenda^ORCID,Cooper Jonathan,Chetty Kevin,Faccio Daniele^ORCID,Imran Muhammad^ORCID,Abbasi Qammer H.

Abstract

AbstractSmall-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.

Funder

Royal Society of Edinburgh

RCUK | Engineering and Physical Sciences Research Council

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-023-02793-w.pdf

Reference30 articles.

1. Cai, C., Zheng, R. & Luo, J. Ubiquitous acoustic sensing on commodity IoT devices: A Survey. IEEE Communications Surveys Tutorials 24, 432–454, https://doi.org/10.1109/COMST.2022.3145856 (2022).

2. Benzeghiba, M. et al. Automatic speech recognition and speech variability: A review. Speech Communication 49, 763–786, https://doi.org/10.1016/j.specom.2007.02.006. Intrinsic Speech Variations (2007).

3. Gonzalez-Lopez, J. A., Gomez-Alanis, A., Martín Doñas, J. M., Pérez-Córdoba, J. L. & Gomez, A. M. Silent speech interfaces for speech restoration: A Review. IEEE Access 8, 177995–178021, https://doi.org/10.1109/ACCESS.2020.3026579 (2020).

4. Bednar, A. & Lalor, E. C. Where is the cocktail party? Decoding locations of attended and unattended moving sound sources using EEG. NeuroImage 205, 116283, https://doi.org/10.1016/j.neuroimage.2019.116283 (2020).

5. Liu, T. et al. Wavoice: A noise-resistant multi-modal speech recognition system fusing mmWave and audio signals. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, SenSys ‘21, 97–110, https://doi.org/10.1145/3485730.3485945 (Association for Computing Machinery, New York, NY, USA, 2021).