Speaker-Following Video Subtitles-Reference-Cited by-同舟云学术

Speaker-Following Video Subtitles

Published:2015-01-07 Issue:2 Volume:11 Page:1-17
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Hu Yongtao¹,Kautz Jan²,Yu Yizhou¹,Wang Wenping¹

Affiliation:

1. The University of Hong Kong

2. University College London

Abstract

We propose a new method for improving the presentation of subtitles in video (e.g., TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/2632111

Reference27 articles.

1. Speaker Diarization: A Review of Recent Research

2. B. Chun D. Ryu W. Hwang and H. Cho 2006. An automated procedure for word balloon placement in cinema comics. Adv. Visual Comput. 576--585. 10.1007/11919629_58 B. Chun D. Ryu W. Hwang and H. Cho 2006. An automated procedure for word balloon placement in cinema comics. Adv. Visual Comput. 576--585. 10.1007/11919629_58

3. J. Driver. 1996. Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature 381 6577 66--8. J. Driver. 1996. Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature 381 6577 66--8.

Cited by 34 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. “Caption It in an Accessible Way That Is Also Enjoyable”: Characterizing User-Driven Captioning Practices on TikTok;Proceedings of the CHI Conference on Human Factors in Computing Systems;2024-05-11

2. Unspoken Sound: Identifying Trends in Non-Speech Audio Captioning on YouTube;Proceedings of the CHI Conference on Human Factors in Computing Systems;2024-05-11

3. Eye Gaze Analysis Towards an AI System for Dynamic Content Layout;Lecture Notes in Computer Science;2024

4. Automated Conversion of Music Videos into Lyric Videos;Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology;2023-10-29

5. Focus on the Motion: Designing Adaptive Subtitles for Online Fitness Videos to Support Ubiquitous Exercises;2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct);2023-10-16