A Multi-Modal Story Generation Framework with AI-Driven Storyline Guidance-Reference-Cited by-同舟云学术

A Multi-Modal Story Generation Framework with AI-Driven Storyline Guidance

Published:2023-03-08 Issue:6 Volume:12 Page:1289
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Kim Juntae¹^ORCID,Heo Yoonseok¹^ORCID,Yu Hogeon²,Nang Jongho¹

Affiliation:

1. Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea

2. Department of Electronic Engineering, Sogang University, Seoul 04107, Republic of Korea

Abstract

An automatic story generation system continuously generates stories with a natural plot. The major challenge of automatic story generation is to maintain coherence between consecutive generated stories without the need for human intervention. To address this, we propose a novel multi-modal story generation framework that includes automated storyline decision-making capabilities. Our framework consists of three independent models: a transformer encoder-based storyline guidance model, which predicts a storyline using a multiple-choice question-answering problem; a transformer decoder-based story generation model that creates a story that describes the storyline determined by the guidance model; and a diffusion-based story visualization model that generates a representative image visually describing a scene to help readers better understand the story flow. Our proposed framework was extensively evaluated through both automatic and human evaluations, which demonstrate that our model outperforms the previous approach, suggesting the effectiveness of our storyline guidance model in making proper plans.

Funder

Institute of Information & communications Technology Planning & Evaluation

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/6/1289/pdf

Reference57 articles.

1. Language models are unsupervised multitask learners;Radford;OpenAI Blog,2019

2. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., and Zettlemoyer, L. (2020, January 5–10). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics(ACL), Online.

3. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., and Amodei, D. (2020, January 6–12). Language models are few-shot learners. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online.

4. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;J. Mach. Learn. Res.,2020

5. Xu, F., Wang, X., Ma, Y., Tresp, V., Wang, Y., Zhou, S., and Du, H. (2020, January 19–23). Controllable Multi-Character Psychology-Oriented Story Generation. Proceedings of the 29th ACM International Conference on Information & Knowledge Management(CKIM), Online.