John Paparrizos
Ph.D. Student, Department of Computer Science, Columbia University
John Paparrizos | Luis Gravano |
jopa@cs.columbia.edu | gravano@cs.columbia.edu |
Columbia University | Columbia University |
This is the website for our ACM SIGMOD 2015 research paper, "k-Shape: Efficient and Accurate Clustering of Time Series."
The paper received the "ACM SIGMOD Research Highlight Award"
and a summary of the paper appears in a SIGMOD Record "Research Highlights" special issue.
A substantially extended version appears as "Fast and Accurate Time-Series Clustering" in a ACM Transactions on Databases "Best of SIGMOD 2015" special issue.
We make our source code publicly available here and provide details on how to obtain free access to the datasets used in our experimental results.
We used the world's largest collection of class-labeled time-series datasets, namely the UCR Time-Series Repository.
To obtain free access to the datasets please refer to the repository's website at http://www.cs.ucr.edu/~eamonn/time_series_data/.
Please note that some of the time-series datasets, namely, Beef, Coffee, Cricket_X, Cricket_Y, Cricket_Z, Fish, OSULeaf, and OliveOil, are either not properly z-normalized or not z-normalized at all. Therefore, for our experiments we also perform the z-normalization step for all datasets. Importantly, due to this issue, our results might differ from what has been reported in the literature, as several works assumed all datasets are already z-normalized.
With Prof. Eamonn Keogh we wrote a more detailed analysis on this matter here.
You can obtain the source code written in Matlab here.
Please contact the first author (jopa@cs.columbia.edu) to obtain the password.
If you use our methods or code please cite our papers:
Paparrizos, John, and Luis Gravano. "k-Shape: Efficient and Accurate Clustering of Time Series." In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855-1870. ACM, 2015. BibTex
Paparrizos, John, and Luis Gravano. "Fast and Accurate Time-Series Clustering." ACM Transactions on Database Systems (TODS) 42, no. 2 (2017): 8. BibTex