[en] This paper proposes new tests to compare two multivariate probability distributions. Since basic ranks do not canonically exist in Rd, it is impossible to have a natural multivariate generalisation of rank-based tests such as the two-sample Kolmogorov–Smirnov test. We thus rely on recent measure transportation theory to transform this d−dimensional problem into one-dimensional classical tests by using space filling curves. To foster lower computation time, we develop distribution-free tests so as to avoid computing critical values for any particular problem. We demonstrate their theoretical validity and compare them to each other and to the existing distribution-free techniques via extensive simulations. We show that they are computationally efficient and most of the time outperform existing techniques when measuring the corresponding power functions. Finally, we apply the proposed tests to a dataset about stars’ luminosity and temperature.
Disciplines :
Mathematics
Author, co-author :
Heuchenne, Cédric ; Université de Liège - ULiège > HEC Liège : UER > UER Opérations : Statistique appliquée à la gestion et à l'économie ; ISBA, Institute of Statistics, Biostatistics and Actuarial Sciences, Catholic University of Louvain, Louvain-la-Neuve, Belgium
Mordant, Gilles; ISBA, Institute of Statistics, Biostatistics and Actuarial Sciences, Catholic University of Louvain, Louvain-la-Neuve, Belgium ; Institut für Mathematische Stochastik, Georg-August-Universität Göttingen, Göttingen, Germany
Language :
English
Title :
Using space filling curves to compare two multivariate distributions with distribution-free tests
F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]
Funding text :
The authors warmly thank Dr. Ghosh for sharing the code of his test procedure. This work was supported by the FNRS, Fonds national de la recherche scientifique, Belgique , PDR/OL T.0080.16 , 2016-2022.
M. Hallin, On Distribution and Quantile Functions, Ranks and Signs in Rd, ECARES Working Papers, 2017.
Peacock, J., Two-dimensional goodness-of-fit testing in astronomy. Mon. Not. R. Astron. Soc. 202:3 (1983), 615–627.
Fasano, G., Franceschini, A., A multidimensional version of the Kolmogorov–Smirnov test. Mon. Not. R. Astron. Soc. 225:1 (1987), 155–170.
Justel, A., Peña, D., Zamar, R., A multivariate Kolmogorov-Smirnov test of goodness of fit. Statist. Probab. Lett. 35:3 (1997), 251–259.
Rosenblatt, M., Remarks on a multivariate transformation. Ann. Math. Stat. 23:3 (1952), 470–472.
Friedman, J.H., Rafsky, L.C., Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Statist., 1979, 697–717.
Schilling, M.F., Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc. 81:395 (1986), 799–806.
Henze, N., A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. Statist., 1988, 772–783.
Liu, Z., Modarres, R., A triangle test for equality of distribution functions in high dimensions. J. Nonparametr. Stat. 23:3 (2011), 605–615.
Rosenbaum, P.R., An exact distribution-free test comparing two multivariate distributions based on adjacency. J. R. Stat. Soc. Ser. B Stat. Methodol. 67:4 (2005), 515–530.
Biswas, M., Mukhopadhyay, M., Ghosh, A.K., A distribution-free two-sample run test applicable to high-dimensional data. Biometrika 101:4 (2014), 913–926.
Liu, R.Y., Singh, K., A quality index based on data depth and multivariate rank tests. J. Amer. Statist. Assoc. 88:421 (1993), 252–260.
Rousson, V., On distribution-free tests for the multivariate two-sample location-scale model. J. Multivariate Anal. 80:1 (2002), 43–57.
Szekely, G.J., Rizzo, M.L., The energy of data. Annu. Rev. Stat. Appl. 4 (2017), 447–479.
Bernton, E., Jacob, P.E., Gerber, M., Robert, C.P., Approximate Bayesian computation with the wasserstein distance. J. R. Stat. Soc. Ser. B Stat. Methodol. 81:2 (2019), 235–269.
Haverkort, H.J., Sixteen space-filling curves and traversals for d-dimensional cubes and simplices. 2017 CoRR, abs/1711.04473, Available from: http://arxiv.org/abs/1711.04473.
Gao, J., Steele, J.M., General spacefilling curve heuristics and limit theory for the traveling salesman problem. J. Complexity 10:2 (1994), 230–245.
Bader, M., Space-Filling Curves: An Introduction with Applications in Scientific Computing, Vol. 9. 2012, Springer Science & Business Media.
Gonzalez, T., Sahni, S., Franta, W.R., An efficient algorithm for the Kolmogorov-Smirnov and Lilliefors tests. ACM Trans. Math. Softw. 3:1 (1977), 60–64.
Hallin, M., del Barrio, E., Cuesta-Albertos, J., Matrán, C., Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach. Ann. Statist. 49:2 (2021), 1139–1165.
Wald, A., Wolfowitz, J., Optimum character of the sequential probability ratio test. Ann. Math. Stat., 1948, 326–339.
S. Wilks, A combinatorial test for the problem of two samples from continuous distributions, in: Proc. Fourth Berkeley Symp. Math. Stat. Prob, volume 1, 1961, pp. 707–717.
Feigelson, E.D., Babu, G.J., Modern Statistical Methods for Astronomy: With R Applications. 2012, Cambridge University Press.
Kruskal, W.H., Wallis, W.A., Use of ranks in one-criterion variance analysis. J. Amer. Statist. Assoc. 47:260 (1952), 583–621.
van der Waerden, B., Order tests for the two-sample problem and their power. Indagationes Mathematicae (Proceedings), Vol. 55, 1952, Elsevier, 453–458.
Kiefer, J., K-sample analogues of the Kolmogorov-Smirnov and Cramér-V. Mises tests. Ann. Math. Stat., 1959, 420–447.
Conover, W., Several k-sample Kolmogorov-Smirnov tests. Ann. Math. Stat. 36:3 (1965), 1019–1026.