Introducing Mplots: scaling time series recurrence plots to massive datasets-Reference-Cited by-同舟云学术

Introducing Mplots: scaling time series recurrence plots to massive datasets

Published:2024-07-20 Issue:1 Volume:11 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Shahcheraghi Maryam,Mercer Ryan,Rodrigues João Manuel de Almeida,Der Audrey,Gamboa Hugo Filipe Silveira,Zimmerman Zachary,Mauck Kerry,Keogh Eamonn

Abstract

AbstractTime series similarity matrices (informally, recurrence plots or dot-plots), are useful tools for time series data mining. They can be used to guide data exploration, and various useful features can be derived from them and then fed into downstream analytics. However, time series similarity matrices suffer from very poor scalability, taxing both time and memory requirements. In this work, we introduce novel ideas that allow us to scale the largest time series similarity matrices that can be examined by several orders of magnitude. The first idea is a novel algorithm to compute the matrices in a way that removes dependency on the subsequence length. This algorithm is so fast that it allows us to now address datasets where the memory limitations begin to dominate. Our second novel contribution is a multiscale algorithm that computes an approximation of the matrix appropriate for the limitations of the user’s memory/screen-resolution, then performs a local, just-in-time recomputation of any region that the user wishes to zoom-in on. Given that this largely removes time and space barriers, human visual attention then becomes the bottleneck. We further introduce algorithms that search massive matrices with quadrillions of cells and then prioritize regions for later examination by either humans or algorithms. We will demonstrate the utility of our ideas for data exploration, segmentation, and classification in domains as diverse as astronomy, bioinformatics, entomology, and wildlife monitoring.

Funder

National Science Foundation

Accenture

Mitsubishi Electric America Foundation

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s40537-024-00954-1.pdf

Reference45 articles.

1. Afonso LCS, Rosa GH, Pereira CR, et al. A recurrence plot-based approach for Parkinson’s disease identification. Future Gen Comput Syst. 2019;94:282–92. https://doi.org/10.1016/j.future.2018.11.054.

2. Alaee S, Mercer R, Kamgar K, Keogh E. Time series motifs discovery under DTW allows more robust discovery of conserved structure. Data Min Knowl Discov. 2021;35:1–48. https://doi.org/10.1007/s10618-021-00740-0.

3. Almeida-Ñauñay AF, Benito RM, Quemada M, et al. Recurrence plots for quantifying the vegetation indices dynamics in a semi-arid grassland. Geoderma. 2022;406: 115488. https://doi.org/10.1016/j.geoderma.2021.115488.

4. Bonani JP, Fereres A, Garzo E, et al. Characterization of electrical penetration graphs of the Asian citrus psyllid, Diaphorina citri, in sweet orange seedlings. Entomol Exp Appl. 2009;134:35–49. https://doi.org/10.1111/j.1570-7458.2009.00937.x.

5. Chesnais Q, Mauck KE. Choice of tethering material influences the magnitude and significance of treatment effects in whitefly electrical penetration graph recordings. J Insect Behav. 2018;31:656–71. https://doi.org/10.1007/s10905-018-9705-x.