k-Shape: Efficient and Accurate Clustering of Time Series

John Paparrizos Luis Gravano
jopa@cs.columbia.edu gravano@cs.columbia.edu
Columbia University Columbia University

This is the website for our ACM SIGMOD 2015 research paper, "k-Shape: Efficient and Accurate Clustering of Time Series."
The paper received the "ACM SIGMOD Research Highlight Award" and a summary of the paper appears in a SIGMOD Record "Research Highlights" special issue.
A substantially extended version appears as "Fast and Accurate Time-Series Clustering" in a ACM Transactions on Databases "Best of SIGMOD 2015" special issue.

We make our source code publicly available here and provide details on how to obtain free access to the datasets used in our experimental results.

Datasets

We used the world's largest collection of class-labeled time-series datasets, namely the UCR Time-Series Repository.
To obtain free access to the datasets please refer to the repository's website at http://www.cs.ucr.edu/~eamonn/time_series_data/.

Please note that some of the time-series datasets, namely, Beef, Coffee, Cricket_X, Cricket_Y, Cricket_Z, Fish, OSULeaf, and OliveOil, are either not properly z-normalized or not z-normalized at all. Therefore, for our experiments we also perform the z-normalization step for all datasets. Importantly, due to this issue, our results might differ from what has been reported in the literature, as several works assumed all datasets are already z-normalized.

With Prof. Eamonn Keogh we wrote a more detailed analysis on this matter here.

Source code

You can obtain the source code written in Matlab here.
Please contact the first author (jopa@cs.columbia.edu) to obtain the password.

References and BibTex

If you use our methods or code please cite our papers:

Paparrizos, John, and Luis Gravano. "k-Shape: Efficient and Accurate Clustering of Time Series." In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855-1870. ACM, 2015. BibTex

Paparrizos, John, and Luis Gravano. "Fast and Accurate Time-Series Clustering." ACM Transactions on Database Systems (TODS) 42, no. 2 (2017): 8. BibTex

Implementations in other programming languages (not tested by the authors)

R package 'dtwclust' by Alexis Sarda-Espinosa

Python package 'tslearn' by Romain Tavenard

k-Shape