Author:
Li Yitong,Min Martin,Shen Dinghan,Carlson David,Carin Lawrence
Abstract
Generating videos from text has proven to be a significant challenge for existing generative models. We tackle this problem by training a conditional generative model to extract both static and dynamic information from text. This is manifested in a hybrid framework, employing a Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN). The static features, called "gist," are used to sketch text-conditioned background color and object layout structure. Dynamic features are considered by transforming input text into an image filter. To obtain a large amount of data for training the deep-learning model, we develop a method to automatically create a matched text-video corpus from publicly available online videos. Experimental results show that the proposed framework generates plausible and diverse short-duration smooth videos, while accurately reflecting the input text information. It significantly outperforms baseline models that directly adapt text-to-image generation procedures to produce videos. Performance is evaluated both visually and by adapting the inception score used to evaluate image generation in GANs.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
62 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Exploring Brazilian Teachers’ Perceptions and a priori Needs to Design Smart Classrooms;International Journal of Artificial Intelligence in Education;2024-07-12
2. Text-driven Video Prediction;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-06-27
3. AcademicVid: Academic PDFs to video generation - Exploratory Literature Survey;2024 International Conference on Emerging Technologies in Computer Science for Interdisciplinary Applications (ICETCS);2024-04-22
4. Vidgen: Long-Form Text-to-Video Generation with Temporal, Narrative and Visual Consistency for High Quality Story-Visualisation Tasks;2024 IEEE 9th International Conference for Convergence in Technology (I2CT);2024-04-05
5. CLOVR: Collecting and Logging OpenVR Data from SteamVR Applications;2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW);2024-03-16