A Pipeline for Story Visualization from Natural Language-Reference-Cited by-同舟云学术

A Pipeline for Story Visualization from Natural Language

Published:2023-04-19 Issue:8 Volume:13 Page:5107
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Zakraoui Jezia¹,Saleh Moutaz¹^ORCID,Al-Maadeed Somaya¹^ORCID,Alja’am Jihad Mohamad¹

Affiliation:

1. Department of Computer Science, Qatar University, Doha 2713, Qatar

Abstract

Generating automatic visualization from natural language texts is an important task for promoting language learning and literacy development for young children and language learners. However, translating a text into a coherent visualization matching its relevant keywords is a challenging problem. To tackle this issue, we proposed a robust story visualization pipeline ranging from NLP and relation extraction to image sequence generation and alignment. First, we applied a shallow semantic representation of the text where we extracted concepts including relevant characters, scene objects, and events in an appropriate format. We also distinguished between simple and complex actions. This distinction helped to realize an optimal visualization of the scene objects and their relationships according to the target audience. Second, we utilized an image generation framework along with different versions to support the visualization task efficiently. Third, we used CLIP similarity function as a semantic relevance metric to check local and global coherence to the whole story. Finally, we validated the scene sequence to compose a final visualization using the different versions for various target audiences. Our preliminary results showed considerable effectiveness in adopting such a pipeline for a coarse visualization task that can subsequently be enhanced.

Funder

Qatar National Research Fund

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/8/5107/pdf

Reference47 articles.

1. Improving Arabic Text to Image Mapping Using a Robust Machine Learning Technique;Zakraoui;IEEE Access,2019

2. Ravi, H., Wang, L., Muniz, C., Sigal, L., Metaxas, D., and Kapadia, M. (2018, January 18–22). Show Me a Story: Towards Coherent Neural Story Illustration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

3. Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., and He, X. (2018, January 18–23). AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

4. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks;Zhang;IEEE Trans. Pattern Anal. Mach. Intell.,2018

5. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016, January 19–24). Generative Adversarial Text to Image Synthesis. Proceedings of the International Conference on Machine Learning, New York, NY, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Procedurally generated AI compound media for expanding audial creations, broadening immersion and perception experience;International Journal of Electronics and Telecommunications;2024-06-25