Abstract
In this work, we report on a novel application of Locality Sensitive Hashing (LSH) to seismic data at scale. Based on the high waveform similarity between reoccurring earthquakes, our application identifies potential earthquakes by searching for similar time series segments via LSH. However, a straightforward implementation of this LSH-enabled application has difficulty scaling beyond 3 months of continuous time series data measured at a single seismic station. As a case study of a data-driven science workflow, we illustrate how domain knowledge can be incorporated into the workload to improve both the efficiency and result quality. We describe several end-to-end optimizations of the analysis pipeline from pre-processing to post-processing, which allow the application to scale to time series data measured at multiple seismic stations. Our optimizations enable an over 100× speedup in the end-to-end analysis pipeline. This improved scalability enabled seismologists to perform seismic analysis on more than ten years of continuous time series data from over ten seismic stations, and has directly enabled the discovery of 597 new earthquakes near the Diablo Canyon nuclear power plant in California and 6123 new earthquakes in New Zealand.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
28 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Data Management for ML-Based Analytics and Beyond;ACM / IMS Journal of Data Science;2024-01-16
2. TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications;Proceedings of the VLDB Endowment;2023-07
3. Machine Learning for the Geosciences;Machine Learning for Data Science Handbook;2023
4. Relation-aware Blocking for Scalable Recommendation Systems;Proceedings of the 31st ACM International Conference on Information & Knowledge Management;2022-10-17
5. Cronus: Computer Vision-based Machine Intelligent Hybrid Memory Management;Proceedings of the 2022 International Symposium on Memory Systems;2022-10-03