Statistical matching using kernel canonical correlation analysis and super-organizing map

Annoye, Hugues; Beretta, Alessandro; Heuchenne, Cédric

doi:10.1016/j.eswa.2023.123134

Download

Article (Scientific journals)

Statistical matching using kernel canonical correlation analysis and super-organizing map

Annoye, Hugues; Beretta, Alessandro; Heuchenne, Cédric

2024 • In Expert Systems with Applications, 246, p. 123134

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/313652

DOI
10.1016/j.eswa.2023.123134

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Manuscript_preprint.pdf

Author postprint (511.86 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Artificial Intelligence; Computer Science Applications; General Engineering

Disciplines :

Engineering, computing & technology: Multidisciplinary, general & others

Author, co-author :

Annoye, Hugues

Beretta, Alessandro ; Université de Liège - ULiège > HEC Liège : UER > UER Opérations : Statistique appliquée à la gestion et à l'économie

Heuchenne, Cédric ; Université de Liège - ULiège > HEC Liège : UER > UER Opérations : Statistique appliquée à la gestion et à l'économie

Language :

English

Title :

Statistical matching using kernel canonical correlation analysis and super-organizing map

Publication date :

July 2024

Journal title :

Expert Systems with Applications

ISSN :

0957-4174

eISSN :

1873-6793

Publisher :

Elsevier

Volume :

246

Pages :

123134

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://api.elsevier.com/content/article/PII:S0957417423036382?httpAccept=text/xml

Funders :

Innoviris - Institut Bruxellois pour la Recherche et l'Innovation

Available on ORBi :

since 25 February 2024

Statistics

Number of views

241 (10 by ULiège)

Number of downloads

254 (13 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Akaho, S., A kernel method for canonical correlation analysis. 2001, 1–7 arXiv preprint arXiv:cs/0609071.
Aluja-Banet, T., Daunis-i Estadella, J., Pellicer, D., GRAFT, a complete system for data fusion. Computational Statistics & Data Analysis 52:2 (2007), 635–649, 10.1016/j.csda.2006.11.029.
Anderson, T.W., Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. Journal of the American Statistical Association 52:278 (1957), 200–203, 10.1080/01621459.1957.10501379.
Bach, F.R., Jordan, M.I., Kernel independent component analysis. Journal of Machine Learning Research 3:Jul (2002), 1–48, 10.1109/ICASSP.2003.1202783.
Brochu, E., Cora, V.M., De Freitas, N., A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. 2010 arXiv preprint arXiv:1012.2599.
Cohen, J., Cohen, P., West, S.G., Aiken, L.S., Applied multiple regression/correlation analysis for the behavioral sciences. 2002, Routledge, 10.4324/9780203774441.
Conti, P.L., Marella, D., Scanu, M., How far from identifiability? A systematic overview of the statistical matching problem in a non parametric framework. Communications in Statistics - Theory and Methods 46:2 (2017), 967–994, 10.1080/03610926.2015.1010005.
Cottrell, M., Letrémy, P., Missing values : processing with the Kohonen algorithm. Applied stochastic models and data analysis 2005, 2007 math/0701152. arXiv:math/0701152.
Deming, W.E., Stephan, F.F., On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics 11:4 (1940), 427–444, 10.1214/aoms/1177731829.
Deville, J., Särndal, C., Calibration approach estimators in sampling. 1992, 10.1080/01621459.1992.10475217.
Domingo-Ferrer, J., Torra, V., Disclosure risk assessment in statistical microdata protection via advanced record linkage. Statistics and Computing 13 (2003), 343–354, 10.1023/A:1025666923033.
Domingo-Ferrer, J., Torra, V., Disclosure risk assessment in statistical data protection. Journal of Computational and Applied Mathematics 164 (2004), 285–293, 10.1016/S0377-0427(03)00643-5.
Donatiello, G., D'Orazio, M., Frattarola, D., Rizzi, A., Scanu, M., Spaziani, M., Statistical matching of income and consumption expenditures. International Journal of Economic Sciences, 3(3), 2014, 50.
D'Orazio, M., Di Zio, M., Scanu, M., Statistical matching for categorical data: Displaying uncertainty and using logical constraints. Journal of Official Statistics Stockholm, 22(1), 2006, 137 URL: https://www.istat.it/en/files/2014/04/jos-2006-221.pdf.
D'Orazio, M., Di Zio, M., Scanu, M., Statistical matching: theory and practice. 2006, John Wiley & Sons, 10.1002/0470023554.
Epanechnikov, V.A., Non-parametric estimation of a multivariate probability density. Theory of Probability and its Applications 14:1 (1969), 153–158, 10.1137/1114019 arXiv:https://doi.org/10.1137/1114019.
Fessant, F., Midenet, S., Self-organising map for data imputation and correction in surveys. Neural Computing and Applications 10:4 (2002), 300–310, 10.1007/s005210200002.
Folguera, L., Zupan, J., Cicerone, D., Magallanes, J.F., Self-organizing maps for imputation of missing data in incomplete data matrices. Chemometrics and Intelligent Laboratory Systems 143 (2015), 146–151, 10.1016/j.chemolab.2015.03.002 URL: http://www.sciencedirect.com/science/article/pii/S016974391500060X.
Fosdick, B.K., DeYoreo, M., Reiter, J.P., et al. Categorical data fusion using auxiliary information. The Annals of Applied Statistics 10:4 (2016), 1907–1929, 10.1214/16-AOAS925.
Hotelling, H., Relations between two sets of variates. Biometrika, 1936, 10.2307/2333955.
Jones, T., A coefficient of determination for probabilistic topic models. 2019 arXiv preprint arXiv:1911.11061.
Kamakura, W.A., Wedel, M., Statistical data fusion for cross-tabulation. Journal of Marketing Research 34:4 (1997), 485–498, 10.2307/3151966.
Kim, J.K., Shao, J., Statistical methods for handling incomplete data. 2013, Chapman and Hall/CRC, New York, 10.1201/b13981.
Kohonen, T., Self-organized formation of topologically correct feature maps. Biological Cybernetics 43:1 (1982), 59–69, 10.1007/BF00337288.
Kuss, M., Graepel, T., The geometry of kernel canonical correlation analysis: Technical report., 2003, Max Planck Institute for Biological Cybernetics URL: http://www.kernel-machines.org/papers/upload_22685_TR-108.pdf.
Lai, P.L., Fyfe, C., Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems 10:05 (2000), 365–377, 10.1142/s012906570000034x.
López-Laborda, J., Marín-González, C., Onrubia-Fernández, J., Estimating engel curves: a new way to improve the SILC-HBS matching process using GLM methods. Journal of Applied Statistics 47 (2020), 1–18, 10.1080/02664763.2020.1796933.
Melzer, T., Reiter, M., Bischof, H., Nonlinear feature extraction using generalized canonical correlation analysis. International conference on artificial neural networks, 2001, Springer, 353–360, 10.1007/3-540-44668-0_50.
Mitsuhiro, M., Hoshino, T., Kernel canonical correlation analysis for data combination of multiple-source datasets. Japanese Journal of Statistics and Data Science 3 (2020), 1–18, 10.1007/s42081-020-00074-z.
Okner, B., Constructing a new data base from existing microdata sets: the 1966 merge file. Annals of economic and social measurement, Vol. 1, 1972, NBER, 325–362 Number 3.
Rässler, S., Statistical matching: a frequentist theory, practical applications, and alternative Bayesian approaches, Vol. 168. 2002, Springer Science & Business Media, 10.1007/978-1-4613-0053-3.
Saverio, G., Romano, M.C., Gianni, C., Di Zio, M., Marcello, D., Federica, P., et al. Time Use and Labour Force: a proposal to integrate the datathrough statistical matching: Technical report., 2008, Istat-Produzione libraria e centro stampa, 297–323 URL: https://ebiblio.istat.it/digibib/Sociali/Uso%20del%20tempo/Time%20use%20in%20daily%20life%202008.pdf.
Schölkopf, B., Smola, A., Müller, K.-R., Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10:5 (1998), 1299–1319, 10.1162/089976698300017467.
Serafino, P., Tonkin, R., Statistical matching of European Union statistics on income and living conditions (EU-SILC) and the household budget survey: Technical teport., 2017, Eurostat: Statistical Working Papers. Luxembourg: Publications Office of the European Union, 1–35, 10.2785/933460.
Shimodaira, H., A simple coding for cross-domain matching with dimension reduction via spectral graph embedding. 2014, 1–12 arXiv preprint arXiv:1412.8380.
Skinner, C., Assessing disclosure risk for record linkage. Privacy in statistical databases: UNESCO chair in data privacy international conference, PSD 2008, Istanbul, Turkey, September 24-26, 2008. Proceedings, 2008, Springer, 166–176, 10.1007/978-3-540-87471-3˙14.
Snoek, J., Larochelle, H., Adams, R.P., Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25, 2012 URL: https://proceedings.neurips.cc/paper_files/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf.
Spaziani, M., Frattarola, D., D'Orazio, M., Integration of survey data in R based on machine learning. Romanian Statistical Review 2019:3 (2019), 5–16, 10.13140/RG.2.2.14022.93762.
Tonkin, R., Webber, D., Statistical matching of EU-SILC and household budget survey to compare poverty estimates using income, expenditures and material deprivation. EU-SILC international conference, Vienna, 2012, 6–7, 10.2785/4151.
Van Buuren, S., Flexible imputation of missing data. 2018, Chapman and Hall/CRC, New York, 10.1201/9780429492259.
Wang, L.-j., Han, J., Zhang, Y., Bai, L.-f., Image fusion via feature residual and statistical matching. IET Computer Vision 10:6 (2016), 551–558, 10.1049/iet-cvi.2015.0280.
Wehrens, R., Buydens, L., Self- and super-organizing maps in r: The kohonen package. Journal of Statistical Software 21:5 (2007), 1–19, 10.18637/jss.v021.i05.
Wehrens, R., Kruisselbrink, J., Flexible self-organizing maps in kohonen 3.0. Journal of Statistical Software 87:7 (2018), 1–18, 10.18637/jss.v087.i07.
Winkler, W.E., Matching and record linkage. Wiley Interdisciplinary Reviews: Computational Statistics 6:5 (2014), 313–325, 10.1002/wics.1317.
Zhang, B., Mackay, E.J., Baiocchi, M., Statistical matching and subclassification with a continuous dose: Characterization, algorithm, and application to a health outcomes study. The Annals of Applied Statistics 17:1 (2023), 454–475 arXiv preprint arXiv:2012.07182.