Taking a Cue From the Human-Reference-Cited by-同舟云学术

Taking a Cue From the Human

Published:2020-12-18 Issue:2 Volume:3 Page:
ISSN:2617-9148
Container-title:Journal of Audiovisual Translation
language:
Short-container-title:JAT

Author:

Starr Kim Linda^ORCID,Braun Sabine^ORCID,Delfani Jaleh^ORCID

Abstract

Human beings find the process of narrative sequencing in written texts and moving imagery a relatively simple task. Key to the success of this activity is establishing coherence by using critical cues to identify key characters, objects, actions and locations as they contribute to plot development. In the drive to make audiovisual media more widely accessible (through audio description), and media archives more searchable (through content description), computer vision experts strive to automate video captioning in order to supplement human description activities. Existing models for automating video descriptions employ deep convolutional neural networks for encoding visual material and feature extraction (Krizhevsky, Sutskever, & Hinton, 2012; Szegedy et al., 2015; He, Zhang, Ren, & Sun, 2016). Recurrent neural networks decode the visual encodings and supply a sentence that describes the moving images in a manner mimicking human performance. However, these descriptions are currently “blind” to narrative coherence. Our study examines the human approach to narrative sequencing and coherence creation using the MeMAD [Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy] film corpus involving five-hundred extracts chosen as stand-alone narrative arcs. We examine character recognition, object detection and temporal continuity as indicators of coherence, using linguistic analysis and qualitative assessments to inform the development of more narratively sophisticated computer models in the future.

Publisher

European Association for Studies in Screen Translation

Subject

General Medicine,General Chemistry

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. “Guidance but not the ultimate commandment”: Pros and Cons of Using Pivot Templates in the Audio Description Production Workflow;The Journal of Specialised Translation;2024-07-29

2. Event boundary perception in audio described films by people without sight;Applied Cognitive Psychology;2024-07

3. Omissions and inferential meaning-making in audio description, and implications for automating video content description;Universal Access in the Information Society;2023-10-08

4. When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience;HCI International 2021 - Late Breaking Papers: Cognition, Inclusion, Learning, and Culture;2021