Abstract
ABSTRACTMotivationThe similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure.ResultsWe propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns pairs of time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies.AvailabilityThe LPWC R package is available at https://github.com/gitter-lab/LPWC and CRAN under a MIT license.Contactchandereng@wisc.edu or gitter@biostat.wisc.eduSupplementary informationSupplementary files are available online.
Publisher
Cold Spring Harbor Laboratory