Abstract
The volume of remote sensing images continues to grow as image sources become more diversified and with increasing spatial and spectral resolution. The handling of such large-volume datasets, which exceed available CPU memory, in a timely and efficient manner is becoming a challenge for single machines. The distributed cluster provides an effective solution with strong calculation power. There has been an increasing number of big data technologies that have been adopted to deal with large images using mature parallel technology. However, since most commercial big data platforms are not specifically developed for the remote sensing field, two main issues exist in processing large images with big data platforms using a distributed cluster. On the one hand, the quantities and categories of official algorithms used to process remote sensing images in big data platforms are limited compared to large amounts of sequential algorithms. On the other hand, the sequential algorithms employed directly to process large images in parallel over a distributed cluster may lead to incomplete objects in the tile edges and the generation of large communication volumes at the shuffle stage. It is, therefore, necessary to explore the distributed strategy and adapt the sequential algorithms over the distributed cluster. In this research, we employed two seed-based image segmentation algorithms to construct a distributed strategy based on the Spark platform. The proposed strategy focuses on modifying the incomplete objects by processing border areas and reducing the communication volume to a reasonable size by limiting the auxiliary bands and the buffer size to a small range during the shuffle stage. We calculated the F-measure and execution time to evaluate the accuracy and execution efficiency. The statistical data reveal that both segmentation algorithms maintained high accuracy, as achieved in the reference image segmented in the sequential way. Moreover, generally the strategy took less execution time compared to significantly larger auxiliary bands and buffer sizes. The proposed strategy can modify incomplete objects, with execution time being twice as fast as the strategies that do not employ communication volume reduction in the distributed cluster.
Funder
the National Key R&D Program of China
Subject
General Earth and Planetary Sciences
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献