• bra-ket 7 days ago

related: Matrix Profiles for time series https://www.cs.ucr.edu/~eamonn/MatrixProfile.html

• uoaei 7 days ago

See Stumpy for a handy library to get this working quickly (written in Python): https://github.com/TDAmeritrade/stumpy

• seanlaw 6 days ago

Hi all, I am the creator of STUMPY and wanted to thank you for your interest. Please feel free to post questions on our Github issues and we'll try to assist where we can.

• ashsriv 6 days ago

I am still a little confused about the real world application of MatrixProfile. It looks really good but once an MP is made then what ?

Can this be automated to say for example - Based on your window, here are all the anomalies.

• amai 6 days ago

Don’t forget: „Clustering of Time Series Subsequences is Meaningless“ : https://www.cs.ucr.edu/~eamonn/meaningless.pdf

• Topolomancer 6 days ago

But this is not about clustering. It's about figuring out to what extent a certain subclass of features, namely the 'shapelets', are statistically significantly associated with a pre-defined binary outcome.

The paper you mentioned is interesting, though, because it shows an issue that many algorithms are privy to: if the number of samples/features gets too large, at some point, you are only comparing _means_.

(We are working on a paper to show the issues of this when it comes to time series classification.)

• valyala 1 day ago

Where to store time series data for further analysis? It is possible to use Prometheus for this - see https://medium.com/@valyala/analyzing-prometheus-data-with-e...

• graycat 7 days ago

Their math in their description of their data is in error: They need to state that the T_i (T with a subscript i), for i = 0, 1, 2, ..., n are distinct.

More standard would be a function d: {0, 1, ..., n} --> R^{1 x m} x {0, 1}.

• Topolomancer 7 days ago

Seems to be standard terminology for time series classification to me, to be honest. I think the approach would also work if there are duplicates in the data. Although the estimate would be overly optimistic, right?

• graycat 6 days ago

With their notation they have not specified that the T's are unique. So, a first fix up would be just to state that the T's were distinct. And it would help to be explicit that i from 0, 1, 2, ... corresponded to increasing time. Moreover, is the data equally spaced in time? Likely, yes, and in that case, clearly say so.

• jmmcd 6 days ago

No, i indexes the patient, not time. (T_0, y_0) is one patients entire time series.

• module0000 6 days ago

This sure reads and looks like technical analysis indicators for time series data.

It's useful though - example: 5 day MA of disk errors rises over the 15 day == likely failure