[en] Abstract We introduce a parsimonious model-based framework for clustering time course data. In these applications the computational burden becomes often an issue due to the large number of available observations. The measured time series can also be very noisy and sparse and an appropriate model describing them can be hard to define. We propose to model the observed measurements by using P-spline smoothers and then to cluster the functional objects as summarized by the optimal spline coefficients. According to the characteristics of the observed measurements, our proposal can be combined with any suitable clustering method. In this paper we provide applications based on non-hierarchical clustering algorithms. We evaluate the accuracy and the efficiency of our proposal by simulations and by analyzing two real data examples.
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Iorio, Carmela; University of Naples Federico II > Department of Economics and Statistics
Frasso, Gianluca ; Université de Liège > Faculté des sciences sociales > Méthodes quantitatives en sciences sociales
D’Ambrosio, Antonio; University of Naples Federico II > Department of Economics and Statistics
Siciliano, Roberta; University of Naples Federico II > Department of Industrial Engineering
Language :
English
Title :
Parsimonious time series clustering using P-splines
Abraham C., Cornillon P.A., Matzner-Løber E., and Molinari N. Unsupervised curve clustering using b-splines Scandinavian Journal of Statistics 30 3 2003 581 595
Alonso A.M., Casado D., López-Pintado S., and Romo J. Robust functional supervised classification for time series Journal of Classification 31 3 2014 325 350
Alt H., and Godau M. Computing the fréchet distance between two polygonal curves International Journal of Computational Geometry & Applications 5 01n02 1995 75 91
Arbeitman M.N., Furlong E.E.M., Imam F., Johnson E., Null B.H., Baker B.S., and et al. Gene expression during the life cycle of drosophila melanogaster Science 297 5590 2002 2270 2275
Askari S., Montazerin N., and Zarandi M.F. A clustering based forecasting algorithm for multivariable fuzzy time series using linear combinations of independent variables Applied Soft Computing 35 2015 151 160
Baragona R. A simulation study on clustering time series with meta-heuristic methods Quaderni di Statistica 3 2001 1 26
de Boor C. A practical guide to splines Applied Mathematical Sciences 1978 Springer New York
Caiado J., Crato N., and Peña D. A periodogram-based metric for time series classification Computational Statistics & Data Analysis 50 10 2006 2668 2684
Chambers E.W., de Verdiere E.C., Erickson J., Lazard S., Lazarus F., and Thite S. Homotopic fréchet distance between curves or, walking your dog in the woods in polynomial time Computational Geometry 43 3 2010 295 311 Special Issue on 24th Annual Symposium on Computational Geometry (SoCG'08).
Cheng C.-H., Cheng G.-W., and Wang J.-W. Multi-attribute fuzzy time series method based on fuzzy clustering Expert Systems with Applications 34 2 2008 1235 1242
Chiou J.-M., and Li P.-L. Functional clustering and identifying substructures of longitudinal data Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69 4 2007 679 699
Coffey N., Hinde J., and Holian E. Clustering longitudinal profiles using p-splines and mixed effects models applied to time-course gene expression data Computational Statistics & Data Analysis 71 2014 14 29
Eilers P.H.C., and Marx B.D. Flexible smoothing with b-splines and penalties Statistical Science 11 1996 89 121
Eilers P.H.C., and Marx B.D. Splines, knots, and penalties Wiley Interdisciplinary Reviews: Computational Statistics 2 6 2010 637 653
Fraley C., and Raftery A.E. Model-based clustering, discriminant analysis, and density estimation Journal of the American Statistical Association 97 458 2002 611 631
Frasso G., and Eilers P.H. L- and v-curves for optimal smoothing Statistical Modelling 15 1 2015 91 111
Fu T.-c., Chung F.-l., Ng V., and Luk R. Pattern discovery from stock time series using self-organizing maps Workshop Notes of KDD2001 Workshop on Temporal Data Mining 2001 26 29
Hansen P.C. Analysis of Discrete Ill-Posed Problems by Means of the L-curve SIAM Review 34(4) 1992 pp.561 580
Hartigan J., and Wong M. Algorithm AS 136: A K-means clustering algorithm Applied Statistics 1979 100 108
Hastie T.J., and Tibshirani R.J. Generalized additive models 1990 Chapman & Hall London
Hubert L., and Arabie P. Comparing partitions Journal of Classification 2 1 1985 193 218
James G.M., and Sugar C.A. Clustering for sparsely sampled functional data Journal of the American Statistical Association 98 2003 397 408
Kaufman L., and Rousseeuw P. Clustering by means of medoids 1987 North-Holland
Keogh E.J., and Pazzani M.J. Derivative dynamic time warping SDM Vol. 1 2001 SIAM 5 7
Komárek A., and Komárková L. Clustering for multivariate continuous and discrete longitudinal data Annals of Applied Statistics 7 1 2013 177 200
Kumar M., Patel N.R., and Woo J. Clustering seasonality patterns in the presence of errors Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining KDD '02 2002 557 563
Li S.-T., Kuo S.-C., Cheng Y.-C., and Chen C.-C. A vector forecasting model for fuzzy time series Applied Soft Computing 11 3 2011 3125 3134 URL http://www.sciencedirect.com/science/article/pii/S1568494610003224.
Liao W.T. Clustering of time series data-a survey Pattern Recognition 38 11 2005 1857 1874
MacQueen J. Some methods for classification and analysis of multivariate observations L.M. Le Cam, J. Neyman, Proceedings of the 5th Berkeley symposium on mathematical statistics and probability Vol. 1 1967 University of California Press Berkeley, CA, USA 281 297
Maharaj E. Cluster of time series Journal of Classification 17 2 2000 297 314
Marx B.D., and Eilers P.H. Multidimensional penalized signal regression Technometrics 47 1 2005 13 22
McLachlan G., Flack L., Ng S., and Wang K. Clustering of gene expression data via normal mixture models A.Y. Yakovlev, L. Klebanov, D. Gaile, Statistical methods for microarray data analysis Methods in Molecular Biology Vol. 972 2013 Springer New York 103 119
Möller-Levet C., Klawonn F., Cho K.-H., and Wolkenhauer O. Fuzzy clustering of short time-series and unevenly distributed sampling points Advances in intelligent data analysis V Vol. 2810 2003 Springer Berlin Heidelberg 330 340
Montero P., and Vilar J.A. Tsclust: An r package for time series clustering Journal of Statistical Software 62 1 2014 1 43
Ramoni M., Sebastiani P., and Cohen P. Bayesian clustering by dynamics Machine Learning 47 1 2002 91 121
Ramsay J.O., and Silverman B.W. Functional data analysis 2nd ed. Springer Series in Statistics 2005 Springer
Ruppert D., Wand P., and Carroll R. Semiparametric Regression Cambridge Series in Statistical and Probabilistic Mathematics 2003 Cambridge University Press
Sakoe H., and Chiba S. Dynamic programming algorithm optimization for spoken word recognition IEEE Transactions on Acoustics, Speech and Signal Processing 26 1 1978 43 49
Sangalli L.M., Secchi P., Vantini S., and Vitelli V. K-mean alignment for curve clustering Computational Statistics & Data Analysis 54 5 2010 1219 1233
Tibshirani R., Walther G., and Hastie T. Estimating the number of clusters in a data set via the gap statistic Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 2 2001 411 423
Vedavathi K., Rao K.S., and Devi K.N. Unsupervised learning algorithm for time series using bivariate ar (1) model Expert Systems with Applications 41 7 2014 3402 3408
Vilar J.A., Alonso A.M., and Vilar J.M. Non-linear time series clustering based on non-parametric forecast densities Computational Statistics & Data Analysis 54 11 2010 2850 2865
Zhang X., Liu J., Du Y., and Lv T. A novel clustering method on time series data Expert Systems with Applications 38 9 2011 11891 11900