Author:
Choi Yong-Sik,Kang Jin-Gu,Joo Jong Wha J.,Jung Jin-Woo
Abstract
AbstractIBM Watson is one of the representative tools for speech recognition system which can automatically generate not only speech-to-text information but also speaker ID and timing information, which is called as Informatized Caption. However, if there is some noise in the voice signal to the IBM Watson API, the recognition performance is significantly decreased. It can be easily found in movies with background music and special sound effects. This paper aims to improve the inaccuracy problem of current Informatized Captions in noisy environments. In this paper, a method of modifying incorrectly recognized words and a method of enhancing timing accuracy while updating database in real time are suggested based on the original caption and Informatized Caption information. Experimental results shows that the proposed method can give 81.09% timing accuracy for the case of 10 representative animation, horror and action movies.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Media Technology,Software
Reference19 articles.
1. Alsamhi SH, Ma O, Ansari MS (2018) Artificial intelligence-based techniques for emerging robotics communication: a survey and future perspectives. arXiv preprint arXiv:1804.09671
2. Ban F, Wu D, Hei Y (2018) Combined forecasting model of urban water consumption based on adaptive filtering and BP neural network. International Journal of Social and Humanistic Computing 3(1):34–45. https://doi.org/10.1504/IJSHC.2018.095011
3. Choi YS, Park HM, Son YS, Jung JW (2017) Informatized caption enhancement based on IBM Watson API. Proceedings of KIIS Autumn Conference 27(2):105–106
4. Choi YS, Son YS, Jung JW (2018) Informatized caption enhancement based on IBM Watson API and speaker pronunciation time-DB. Computer Science & Information Technology – computer science conference proceedings :105-110
5. Choi YS, Son YS, Jung JW (2018) A method to enhance Informatized caption from IBM Watson API using speaker pronunciation time-DB. International Journal on Natural Language Computing 7(1):1–11