Lightly supervised alignment of subtitles on multi-genre broadcasts
-
Published:2018-05-29
Issue:23
Volume:77
Page:30533-30550
-
ISSN:1380-7501
-
Container-title:Multimedia Tools and Applications
-
language:en
-
Short-container-title:Multimed Tools Appl
Author:
Saz Oscar, Deena SalilORCID, Doulaty Mortaza, Hasan Madina, Khaliq Bilal, Milner Rosanna, Ng Raymond W. M., Olcoz Julia, Hain Thomas
Abstract
AbstractThis paper describes a system for performing alignment of subtitles to audio on multigenre broadcasts using a lightly supervised approach. Accurate alignment of subtitles plays a substantial role in the daily work of media companies and currently still requires large human effort. Here, a comprehensive approach to performing this task in an automated way using lightly supervised alignment is proposed. The paper explores the different alternatives to speech segmentation, lightly supervised speech recognition and alignment of text streams. The proposed system uses lightly supervised decoding to improve the alignment accuracy by performing language model adaptation using the target subtitles. The system thus built achieves the third best reported result in the alignment of broadcast subtitles in the Multi–Genre Broadcast (MGB) challenge, with an F1 score of 88.8%. This system is available for research and other non–commercial purposes through webASR, the University of Sheffield’s cloud–based speech technology web service. Taking as inputs an audio file and untimed subtitles, webASR can produce timed subtitles in multiple formats, including TTML, WebVTT and SRT.
Funder
Engineering and Physical Sciences Research Council
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Media Technology,Software
Reference46 articles.
1. Alvarez A, Mendes C, Raffaelli M, Luis T, Paulo S, Piccinini N, Arzelus H, Neto J, Aliprandi C, del Pozo A (2015) Automatic live and batch subtitling of multimedia contents for several European languages. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-015-2794-z 2. Bell P, Gales M, Hain T, Kilgour J, Lanchantin P, Liu X, McParland A, Renals S, Saz O, Webster M, Woodland P (2015) The MGB challenge: Evaluating multi–genre broadcast media transcription. In: ASRU’15: Proc Of IEEE workshop on automatic speech recognition and understanding, Scottsdale 3. Bell P, Gales MJF, Hain T, Kilgour J, Lanchantin P, Liu X, McParland A, Renals S, Saz O, Wester M, Woodland PC (2015) The MGB Challenge: evaluating multi-genre broadcast media recognition. In: Proceedings of the 2015 IEEE automatic speech recognition and understanding workshop, Scottsdale, pp 687–693 4. Bordel G, Peñagarikano M., Rodríguez-fuentes LJ, Varona A (2012) A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions. In: Proceedings of the 13th annual conference of the international speech communication association (Interspeech), Portland, pp 1840–1843 5. Braunschweiler N, Gales MJF, Buchholz S (2010) Lightly supervised recognition for automatic alignment of large coherent speech recordings. In: INTERSPEECH, pp 2222–2225
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|