Computing Short Films Using Language-Guided Diffusion and Vocoding Through Virtual Timelines of Summaries-Reference-Cited by-同舟云学术

Computing Short Films Using Language-Guided Diffusion and Vocoding Through Virtual Timelines of Summaries

Published:2023-07-15 Issue:10 Volume: Page:71-89
ISSN:2637-1898
Container-title:INSAM Journal of Contemporary Music, Art and Technology
language:en
Short-container-title:INSAM

Author:

Arandas Luís¹^ORCID,Carvalhais Miguel²^ORCID,Grierson Mick³^ORCID

Affiliation:

1. University of Porto – INESC-TEC, Porto, Portugal

2. University of Porto – i2ADS, Porto, Portugal

3. University of the Arts London – CCI, London, United Kingdom

Abstract

Language-guided generative models are increasingly used in audiovisual production. Image diffusion allows for the development of video sequences and some of its coordination can be established by text prompts. This research automates a video production pipeline leveraging CLIP-guidance with longform text inputs and a separate text-to-speech system. We introduce a method for producing frame-accurate video and audio summaries using a virtual timeline and document a set of video outputs with diverging parameters. Our approach was applied in the production of the film Irreplaceable Biography and contributes to a future where multimodal generative architectures are set as underlying mechanisms to establish visual sequences in time. We contribute to a practice where language modelling is part of a shared and learned representation which can support professional video production, specifically used as a vehicle throughout the composition process as potential videography in physical space.

Funder

Fundação para a Ciência e a Tecnologia

Publisher

INSAM Institute for Contemporary Art, Music and Technology

Subject

General Medicine

Reference34 articles.

1. Akten, Memo, Rebecca Fiebrink, and Mick Grierson. 2019. “Learning to see: you are what you see”. ACM SIGGRAPH Art Gallery: 1–6.

2. Akten, Memo, Rebecca Fiebrink, and Mick Grierson. 2020. “Deep Meditations: Controlled navigation of latent space”. arXiv:2003.00910.

3. Beltagy, Iz, Matthew E Peters, and Arman Cohan. 2020. “Longformer: The long-document transformer”. arXiv preprint arXiv:2004.05150.

4. Bhat, Shariq Farooq, Ibraheem Alhashim, and Peter Wonka. 2021. “Adabins: Depth estimation using adaptive bins”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

5. Brooks, Tim, Aleksander Holynski, and Alexei A Efros. 2022. “Instructpix2pix: Learning to follow image editing instructions”. arXiv preprint arXiv:2211.09800.