Speech Recorder and Translator using Google Cloud Speech-to-Text and Translation-Reference-Cited by-同舟云学术

Speech Recorder and Translator using Google Cloud Speech-to-Text and Translation

Published:2021-11-30 Issue:1 Volume:9 Page:11-28
ISSN:1823-5042
Container-title:Journal of IT in Asia
language:
Short-container-title:JITA

Author:

Wang Hui Hui

Abstract

The most popular video website YouTube has about 2 billion users worldwide who speak and understand different languages. Subtitles are essential for the users to get the message from the video. However, not all video owners provide subtitles for their videos. It causes the potential audiences to have difficulties in understanding the video content. Thus, this study proposed a speech recorder and translator to solve this problem. The general concept of this study was to combine Automatic Speech Recognition (ASR) and translation technologies to recognize the video content and translate it into other languages. This paper compared and discussed three different ASR technologies. They are Google Cloud Speech-to-Text, Limecraft Transcriber, and VoxSigma. Finally, the proposed system used Google Cloud Speech-to-Text because it supports more languages than Limecraft Transcriber and VoxSigma. Besides, it was more flexible to use with Google Cloud Translation. This paper also consisted of a questionnaire about the crucial features of the speech recorder and translator. There was a total of 19 university students participated in the questionnaire. Most of the respondents stated that high translation accuracy is vital for the proposed system. This paper also discussed a related work of speech recorder and translator. It was a study that compared speech recognition between ordinary voice and speech impaired voice. It used a mobile application to record acoustic voice input. Compared to the existing mobile App, this project proposed a web application. It was a different and new study, especially in terms of development and user experience. Finally, this project developed the proposed system successfully. The results showed that Google Cloud Speech-to-Text and Translation were reliable to use in video translation. However, it could not recognize the speech when the background music was too loud. Besides, it had a problem of direct translation, which was challenging. Thus, future research may need a custom trained model. In conclusion, the proposed system in this project was to contribute a new idea of a web application to solve the language barrier on the video watching platform.

Publisher

UNIMAS Publisher

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Websites Optimization Metrics: A Systematic Literature Review;Journal of Prospective Researches;2024-04-10

2. GlobalLingua: Empowering Multilingual Access to YouTube Video Transcripts with Automated Translation;Journal of Prospective Researches;2024-04-09

3. Bridging the Linguistic Gap;Automatic Speech Recognition and Translation for Low Resource Languages;2024-03-29

4. Development and Assessment of Internet of Things-Driven Smart Home Security and Automation with Voice Commands;IoT;2024-02-01

5. Applying automated machine translation to educational video courses;Education and Information Technologies;2023-10-02