Foundation models meet visualizations: Challenges and opportunities-Reference-Cited by-同舟云学术

Foundation models meet visualizations: Challenges and opportunities

Published:2024-05-02 Issue:3 Volume:10 Page:399-424
ISSN:2096-0433
Container-title:Computational Visual Media
language:en
Short-container-title:Comp. Visual Media

Author:

Yang Weikai,Liu Mengchen,Wang Zheng,Liu Shixia

Abstract

AbstractRecent studies have indicated that foundation models, such as BERT and GPT, excel at adapting to various downstream tasks. This adaptability has made them a dominant force in building artificial intelligence (AI) systems. Moreover, a new research paradigm has emerged as visualization techniques are incorporated into these models. This study divides these intersections into two research areas: visualization for foundation model (VIS4FM) and foundation model for visualization (FM4VIS). In terms of VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FM addresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in terms of FM4VIS, we highlight how foundation models can be used to advance the visualization field itself. The intersection of foundation models with visualizations is promising but also introduces a set of challenges. By highlighting these challenges and promising opportunities, this study aims to provide a starting point for the continued exploration of this research avenue.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s41095-023-0393-x.pdf

Reference129 articles.

1. Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M. S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.

2. Devlin, J.; Chang, M. W.; Lee, K.; Toutanova, K. BERT: Pretraining of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186, 2019.

3. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations, 2021.

4. Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14408–14419, 2023.

5. Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, 8748–8763, 2021.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BF-SAM: enhancing SAM through multi-modal fusion for fine-grained building function identification;International Journal of Geographical Information Science;2024-09-05

2. Sticky Links: Encoding Quantitative Data of Graph Edges;IEEE Transactions on Visualization and Computer Graphics;2024-06

3. TacPrint: Visualizing the Biomechanical Fingerprint in Table Tennis;IEEE Transactions on Visualization and Computer Graphics;2024-06

4. JsonCurer: Data Quality Management for JSON Based on an Aggregated Schema;IEEE Transactions on Visualization and Computer Graphics;2024-06

5. Enhancing Single-Frame Supervision for Better Temporal Action Localization;IEEE Transactions on Visualization and Computer Graphics;2024-06