Global–Local Deep Fusion: Semantic Integration with Enhanced Transformer in Dual-Branch Networks for Ultra-High Resolution Image Segmentation-Reference-Cited by-同舟云学术

Global–Local Deep Fusion: Semantic Integration with Enhanced Transformer in Dual-Branch Networks for Ultra-High Resolution Image Segmentation

Published:2024-06-23 Issue:13 Volume:14 Page:5443
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Liang Chenjing¹^ORCID,Huang Kai¹^ORCID,Mao Jian¹

Affiliation:

1. College of Computer Engineer, Jimei University, Xiamen 361021, China

Abstract

The fusion of global contextual information with local cropped block details is crucial for segmenting ultra-high resolution images. In this study, A novel fusion mechanism termed global–local deep fusion (GL-Deep Fusion) is introduced, based on an enhanced transformer architecture that efficiently integrates global contextual information and local details. Specifically, we propose the global–local synthesis networks (GLSNet), a dual-branch network where one branch processes the entire original image, while the other branch handles cropped local patches as input. The feature fusion of different branches in GLSNet is achieved through GL-Deep Fusion, significantly enhancing the accuracy of ultra-high resolution image segmentation. Identifying tiny overlapping items is a task where the model excels, demonstrating its particular effectiveness. To optimize GPU memory utilization, a dual-branch architecture was meticulously designed. This architecture proficiently leverages the features it extracts and seamlessly integrates them into the enhanced transformer framework of GL-Deep Fusion. Benchmarks on the DeepGlobe and Vaihingen datasets demonstrate the efficiency and accuracy of the proposed model. It significantly reduces GPU memory usage by 24.1% on the DeepGlobe dataset, enhancing segmentation accuracy by 0.8% over the baseline model. On the Vaihingen dataset, our model delivers a Mean F1 score of 90.2% and achieves a mIoU of 90.9%, highlighting its exceptional memory efficiency and segmentation precision.

Funder

Natural Science Foundation of Xiamen, China

Department of Education of the Fujian Province of China

Natural Science Foundation of Fujian Province of China

Xiamen Science and Technology Subsidy Project

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/13/5443/pdf

Reference47 articles.

1. Image segmentation using deep learning: A survey;Minaee;IEEE Trans. Pattern Anal. Mach. Intell.,2021

2. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27–30). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

3. Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., and Raskar, R. (2018, January 18–23). Deepglobe 2018: A challenge to parse the earth through satellite images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.

4. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv.

5. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs;Chen;IEEE Trans. Pattern Anal. Mach. Intell.,2017