Stumpy

Times Series Analysis

Author

Oren Bochman

Published

Thursday, August 8, 2024

Keywords

distance matrix, matrix profile, stumpy, data mining, time series analysis, motif discovery, discord detection, semantic segmentation, shapelet discovery

Stumpy - Time Series Analysis

Stumpy (Law 2019) is a powerful and scalable library for computing a matrix profile which can be used for a variety of time series data mining tasks.

Law, Sean M. 2019. STUMPY: A Powerful and Scalable Python Library for Time Series Data Mining.” The Journal of Open Source Software 4 (39): 1504.
———. 2021. “Modern Time Series Analysis with STUMPY.” https://youtu.be/XKNdXN-Jfmo.

This post is based on (Law 2021)

TLDR 🥜
  • Are you Stomped on time series analysis, Stumpy may be the library for you. 😀
  • Stumpy is a FOSS library with scalable algorithms for computing a matrix profile which can be used for a variety of time series data mining tasks.
  • It is easy to use and has a simple API that makes it easy to get started with time series analysis.

Introduction

. It is easy to use and has a simple API that makes it easy to get started with time series analysis.

Under the hood Stumpy calculates the pairwise euclidean distance between two subsequences of a time series. By considering all possible subsequences of a time series of a fixed length, we get a distance matrix which is a matrix of distances between subsequences. However the matrix profile is too slow to compute using brute force and is also memory intensive to store. The research behind Stumpy uses a number of optimizations to make the computation of the matrix profile which is list the nearest neighbors for each subsequence in a time series. This structure is O(n) in space and O(n^2) time complexity.

Note: before computing the matrix profile, Stumpy first normalises the subsequences to have zero mean and unit variance. This is done to ensure that the distance between subsequences is meaningful and that the matrix profile is accurate. 1

1 This is a common step in time series analysis and is done to remove any trends or seasonality in the data. It can also be disabled if needed.

why do we need to compute the matrix profile? The matrix profile is a powerful tool for time series analysis that can be used to identify patterns in the data.

Given the matrix profile, most time series data mining tasks are trivial or easy to solve in a few lines of code. - Emonnn Keogh

For example, motif discovery, discord discovery, semantic segmentation, and shapelet discovery can all be solved using the matrix profile.

By comparing the distance between subsequences, Stumpy can identify patterns in the data such as motifs, discords, chains, and other patterns. and then uses the matrix profile to identify motifs, discords, chains, and other patterns in the data.

Motivation

You get tasked with analyzing a time series data set there are many (10000+) points what do you do?

Magic Spells for Time Series Analysis:

  • Visualizations
    • great for small data sets
    • not suitable with more than 1000 data points
  • Statistics
    • traditional stats are not as meaningful for time series data
    • with many points we tend to care more about subsequences than points.
  • ARIMA (AutoRegressive Integrated Moving Average)
    • autoregressive model assuming some kind of repeating patterns
  • Anomaly Detection
    • assuming that we can agree what is normal
  • ML (Predictive Modeling)
    • assuming that we can predict the future
    • using features to predict values based on a projection or interpolation.
  • Forecasting
    • Using trends, historiacal and statistical data.
    • forecasting is for longer term frames
  • Clustering
    • grouping similar parts of a time series together
    • grouping similar time series together
  • Dynamic Time Warping
    • comparing time series that are not aligned
    • comparing time series that are not the same length
    • assuming we know enough about the time series to do this.
  • Change Detection
    • detecting when a time series changes regimes.

Stumpy’s Approach

Stumpy tries to answer two questions:

  1. Do any subsequences appear more than once in a time series?
  2. If they are such subsequences, what are they and where do they appear?

the design goals seem to be

  • be easy to interpret
  • use/data agnostic
  • no prior knowledge of the data
  • parameter free

Features

Stumpy has a number of features that make it a powerful tool for time series analysis. Some of the key features of Stumpy include:

  • High level features:
    • Motif Discovery
    • Discord Detection
    • TS Chains
    • Semantic Segmentation
    • Shapelet & Snippets
    • MPdist Clustering
    • Multi-dimensional TS
  • Low level features:
    • Fast and memory efficient computation of matrix profiles.
    • Support for both univariate and multivariate time series data.
    • Support for both fixed-length and variable-length time series data.
    • Support for both Euclidean and DTW distance measures.
    • Support for both exact and approximate matrix profile computation.
    • Support for both CPU and GPU computation.
    • Support for both online and offline matrix profile computation.

Big idea

Novelty

Papers

Stumpy is based on a number of research papers that have been published in the field of time series data mining. Some of the key papers that have inspired Stumpy include:

Yeh, Chin-Chia Michael, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2016. “Matrix Profile i: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” In 2016 IEEE 16th International Conference on Data Mining (ICDM), 1317–22. https://doi.org/10.1109/ICDM.2016.0179.

Zhu, Yan, et al. (2016) Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins. ICDM:739-748.

Yeh, Chin-Chia Michael, et al. (2017) Matrix Profile VI: Meaningful Multidimensional Motif Discovery. ICDM:565-574.

Zhu, Yan, et al. (2017) Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. ICDM:695-704.

Gharghabi, Shaghayegh, et al. (2017) Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. ICDM:117-126.

Zhu, Yan, et al. (2017) Exploiting a Novel Algorithm and GPUs to Break the Ten Quadrillion Pairwise Comparisons Barrier for Time Series Motifs and Joins. KAIS:203-236.

Zhu, Yan, et al. (2018) Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speeds. ICDM:837-846.

Yeh, Chin-Chia Michael, et al. (2018) Time Series Joins, Motifs, Discords and Shapelets: a Unifying View that Exploits the Matrix Profile. Data Min Knowl Disc:83-123.

Gharghabi, Shaghayegh, et al. (2018) “Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios.” ICDM:965-970.

Zimmerman, Zachary, et al. (2019) Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond. SoCC ’19:74-86.

Akbarinia, Reza, and Betrand Cloez. (2019) Efficient Matrix Profile Computation Using Different Distance Functions. arXiv:1901.05708.

Kamgar, Kaveh, et al. (2019) Matrix Profile XV: Exploiting Time Series Consensus Motifs to Find Structure in Time Series Sets. ICDM:1156-1161.

Citation

BibTeX citation:
@online{bochman2024,
  author = {Bochman, Oren},
  title = {Stumpy},
  date = {2024-08-08},
  url = {https://orenbochman.github.io/posts/2024/2024-08-08-TS-with-Stumpy/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Stumpy.” August 8, 2024. https://orenbochman.github.io/posts/2024/2024-08-08-TS-with-Stumpy/.