Efficient Large-Scale GPS Trajectory Compression on Spark: A Pipeline-Based Approach
-
Published:2023-08-24
Issue:17
Volume:12
Page:3569
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Xiong Wen12, Wang Xiaoxuan12, Li Hao1
Affiliation:
1. School of Information, Yunnan Normal University, Kunming 650500, China 2. Engineering Research Center of Computer Vision and Intelligent Control Technology, Yunnan Provincial Department of Education, Kunming 650500, China
Abstract
Every day, hundreds of thousands of vehicles, including buses, taxis, and ride-hailing cars, continuously generate GPS positioning records. Simultaneously, the traffic big data platform of urban transportation systems has already collected a large amount of GPS trajectory datasets. These incremental and historical GPS datasets require more and more storage space, placing unprecedented cost pressure on the big data platform. Therefore, it is imperative to efficiently compress these large-scale GPS trajectory datasets, saving storage cost and subsequent computing cost. However, a set of classical trajectory compression algorithms can only be executed in a single-threaded manner and are limited to running in a single-node environment. Therefore, these trajectory compression algorithms are insufficient to compress this incremental data, which often amounts to hundreds of gigabytes, within an acceptable time frame. This paper utilizes Spark, a popular big data processing engine, to parallelize a set of classical trajectory compression algorithms. These algorithms consist of the DP (Douglas–Peucker), the TD-TR (Top-Down Time-Ratio), the SW (Sliding Window), SQUISH (Spatial Quality Simplification Heuristic), and the V-DP (Velocity-Aware Douglas–Peucker). We systematically evaluate these parallelized algorithms on a very large GPS trajectory dataset, which contains 117.5 GB of data produced by 20,000 taxis. The experimental results show that: (1) It takes only 438 s to compress this dataset in a Spark cluster with 14 nodes; (2) These parallelized algorithms can save an average of 26% on storage cost, and up to 40%. In addition, we design and implement a pipeline-based solution that automatically performs preprocessing and compression for continuous GPS trajectories on the Spark platform.
Funder
National Natural Science Foundation of China
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference37 articles.
1. Trajectory data mining: An overview;Zheng;ACM Trans. Intell. Syst. Technol.,2015 2. Liang, M., Chen, W.J., Duan, P., and Li, J. (2019). Evaluation for typical compression method of trajectory data. Bull. Surv. Mapp., 60–64. 3. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature;Douglas;Cartogr. Int. J. Geogr. Inf. Geovisualization,1973 4. Meratnia, N., and By, R.D. (2004, January 14–18). Spatiotemporal compression techniques for moving point objects. Proceedings of the International Conference on Extending Database Technology, Heraklion, Crete, Greece. 5. Keogh, E., Chu, S., Hart, D., and Pazzani, M. (December, January 29). An online algorithm for segmenting time series. Proceedings of the IEEE International Conference on Data Mining, San Jose, CA, USA.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|