A methodology of multimodal corpus creation for audio-visual speech recognition in assistive transport systems-Reference-Cited by-同舟云学术

A methodology of multimodal corpus creation for audio-visual speech recognition in assistive transport systems

Published:2020-12 Issue: Volume:5 Page:87-93
ISSN:2078-8320
Container-title:Informatization and communication
language:
Short-container-title:I&C

Author:

Axyonov A.A., ,Ivanko D.V.,Lashkov I.B.,Ryumin D.A.,Kashevnik A.M.,Karpov A.A.

Abstract

This paper introduces a new methodology of multimodal corpus creation for audio-visual speech recognition in driver monitoring systems. Multimodal speech recognition allows using audio data when video data are useless (e.g. at nighttime), as well as applying video data in acoustically noisy conditions (e.g., at highways). The article discusses several basic scenarios when speech recognition in the vehicle environment is required to interact with the driver monitoring system. The methodology defi nes the main stages and requirements for the design of a multimodal building. The paper also describes metaparameters that the multimodal corpus must correspond to. In addition, a software package for recording an audiovisual speech corpus is described.

Publisher

Informatization and Communication Journal Editorial Board

Subject

General Agricultural and Biological Sciences

Reference10 articles.

1. Falaki H. et al. Diversity in smartphone usage //Proceedings of the 8th international conference on Mobile systems, applications, and services. - 2010. - С. 179-194.

2. Kashevnik A. et al. Cloud-Based Driver Monitoring System Using a Smartphone //IEEE Sensors Journal. - 2020. - Т. 20. - №. 12. - С. 6701-6715.

3. Kim J. et al. Context-based rider assistant system for two wheeled self-balancing vehicles //ТрудыСПИИРАН. - 2019. - Т. 18. - №. 3. - С. 582-613.

4. Kipyatkova I. S., Karpov A. A. Variants of deep artifi cial neural networks for speech recognition systems // Труды СПИИРАН. - 2016. - Т. 49. - С. 80-103.

5. Afouras T. et al. Deep audio-visual speech recognition // IEEE transactions on pattern analysis and machine intelligence. - 2018.