Efficient Large-Scale GPS Trajectory Compression on Spark: A Pipeline-Based Approach-Reference-Cited by-同舟云学术

Efficient Large-Scale GPS Trajectory Compression on Spark: A Pipeline-Based Approach

Published:2023-08-24 Issue:17 Volume:12 Page:3569
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Xiong Wen¹²,Wang Xiaoxuan¹²,Li Hao¹

Affiliation:

1. School of Information, Yunnan Normal University, Kunming 650500, China

2. Engineering Research Center of Computer Vision and Intelligent Control Technology, Yunnan Provincial Department of Education, Kunming 650500, China

Abstract

Every day, hundreds of thousands of vehicles, including buses, taxis, and ride-hailing cars, continuously generate GPS positioning records. Simultaneously, the traffic big data platform of urban transportation systems has already collected a large amount of GPS trajectory datasets. These incremental and historical GPS datasets require more and more storage space, placing unprecedented cost pressure on the big data platform. Therefore, it is imperative to efficiently compress these large-scale GPS trajectory datasets, saving storage cost and subsequent computing cost. However, a set of classical trajectory compression algorithms can only be executed in a single-threaded manner and are limited to running in a single-node environment. Therefore, these trajectory compression algorithms are insufficient to compress this incremental data, which often amounts to hundreds of gigabytes, within an acceptable time frame. This paper utilizes Spark, a popular big data processing engine, to parallelize a set of classical trajectory compression algorithms. These algorithms consist of the DP (Douglas–Peucker), the TD-TR (Top-Down Time-Ratio), the SW (Sliding Window), SQUISH (Spatial Quality Simplification Heuristic), and the V-DP (Velocity-Aware Douglas–Peucker). We systematically evaluate these parallelized algorithms on a very large GPS trajectory dataset, which contains 117.5 GB of data produced by 20,000 taxis. The experimental results show that: (1) It takes only 438 s to compress this dataset in a Spark cluster with 14 nodes; (2) These parallelized algorithms can save an average of 26% on storage cost, and up to 40%. In addition, we design and implement a pipeline-based solution that automatically performs preprocessing and compression for continuous GPS trajectories on the Spark platform.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/17/3569/pdf

Reference37 articles.

1. Trajectory data mining: An overview;Zheng;ACM Trans. Intell. Syst. Technol.,2015

2. Liang, M., Chen, W.J., Duan, P., and Li, J. (2019). Evaluation for typical compression method of trajectory data. Bull. Surv. Mapp., 60–64.

3. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature;Douglas;Cartogr. Int. J. Geogr. Inf. Geovisualization,1973

4. Meratnia, N., and By, R.D. (2004, January 14–18). Spatiotemporal compression techniques for moving point objects. Proceedings of the International Conference on Extending Database Technology, Heraklion, Crete, Greece.

5. Keogh, E., Chu, S., Hart, D., and Pazzani, M. (December, January 29). An online algorithm for segmenting time series. Proceedings of the IEEE International Conference on Data Mining, San Jose, CA, USA.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A New Trajectory Reduction Method for Mobile Devices Operating Both Online and Offline;Arabian Journal for Science and Engineering;2024-04-25

2. Toward ML-Based Application for Vehicles Operation Cost Management;Lecture Notes in Mechanical Engineering;2024

3. Polygon Simplification for the Efficient Approximate Analytics of Georeferenced Big Data;Sensors;2023-09-29