Significant Pattern Mining for Time Series

(christian.bock.ml)

141 points | by cbock90 98 days ago

5 comments

  • bra-ket 98 days ago

    related: Matrix Profiles for time series https://www.cs.ucr.edu/~eamonn/MatrixProfile.html

    • uoaei 98 days ago

      See Stumpy for a handy library to get this working quickly (written in Python): https://github.com/TDAmeritrade/stumpy

      • seanlaw 98 days ago

        Hi all, I am the creator of STUMPY and wanted to thank you for your interest. Please feel free to post questions on our Github issues and we'll try to assist where we can.

      • ashsriv 98 days ago

        I am still a little confused about the real world application of MatrixProfile. It looks really good but once an MP is made then what ?

        Can this be automated to say for example - Based on your window, here are all the anomalies.

      • amai 98 days ago

        Don’t forget: „Clustering of Time Series Subsequences is Meaningless“ : https://www.cs.ucr.edu/~eamonn/meaningless.pdf

        • Topolomancer 98 days ago

          But this is not about clustering. It's about figuring out to what extent a certain subclass of features, namely the 'shapelets', are statistically significantly associated with a pre-defined binary outcome.

          The paper you mentioned is interesting, though, because it shows an issue that many algorithms are privy to: if the number of samples/features gets too large, at some point, you are only comparing _means_.

          (We are working on a paper to show the issues of this when it comes to time series classification.)

        • valyala 93 days ago

          Where to store time series data for further analysis? It is possible to use Prometheus for this - see https://medium.com/@valyala/analyzing-prometheus-data-with-e...

          • graycat 98 days ago

            Their math in their description of their data is in error: They need to state that the T_i (T with a subscript i), for i = 0, 1, 2, ..., n are distinct.

            More standard would be a function d: {0, 1, ..., n} --> R^{1 x m} x {0, 1}.

            • Topolomancer 98 days ago

              Seems to be standard terminology for time series classification to me, to be honest. I think the approach would also work if there are duplicates in the data. Although the estimate would be overly optimistic, right?

              • graycat 98 days ago

                With their notation they have not specified that the T's are unique. So, a first fix up would be just to state that the T's were distinct. And it would help to be explicit that i from 0, 1, 2, ... corresponded to increasing time. Moreover, is the data equally spaced in time? Likely, yes, and in that case, clearly say so.

                • jmmcd 98 days ago

                  No, i indexes the patient, not time. (T_0, y_0) is one patients entire time series.

            • module0000 98 days ago

              This sure reads and looks like technical analysis indicators for time series data.

              It's useful though - example: 5 day MA of disk errors rises over the 15 day == likely failure