Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset

Peeters, Luk; Dassargues, Alain

Paper published in a book (Scientific congresses and symposiums)

Peeters, Luk; Dassargues, Alain

2006

Permalink
https://hdl.handle.net/2268/3388

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

publi144-2007.pdf

Author preprint (1.54 MB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

groudwater quality; exploratory data analysis; principal component analysis; Self-Organizing Map algorithm; Kohonen's Self-Organizing Map; SOMs

Abstract :

[en] Groundwater monitoring networks typically yield large, multivariate datasets. Analysis and interpretation of these datasets starts with an exploratory data analysis in order to summarize the available data, extract useful information and formulate hypotheses for further research. Exploratory data analysis is mostly focussed on finding related variables and groupings of similar observations. Traditionally multivariate statistical techniques like principal component analysis (PCA) are used for this purpose. In PCA a linear dimensionality reduction of the original, high dimensional dataset is carried out in order to identify orthogonal directions (principal components) of maximum variance in the dataset based on linear combinations of correlated variables. Projections of the original data in the subspace defined by the principal components can be used to identify groups in the data and to reveal relationships between variables (Davis, 1986). In this study, principal component analysis is compared to Kohonen's self-organizing map (SOM) algorithm. The SOM-algorithm is an artificial neural network technique designed to carry out a non-parametric regression process that is mainly used to represent high-dimensional, nonlinearly related data items in a topology-preserving, often two-dimensional display, and to perform unsupervised classification and clustering (Kohonen, 1995). Both PCA and SOM are applied to a hydrochemical dataset from a monitoring network in two sandy, phreatic aquifers in Central Belgium. The monitoring network consists of 47 monitoring wells each equipped with three filters at different depths, in which 14 variables are measured. The first aquifer, the Diest sands aquifer is of Late Miocene age and consists of coarse, glauconiferous sands and sandstones (Laga et al., 2001). The second aquifer, the Brussels sands aquifer, is of Middle Eocene age and is an heterogeneous formation consisting of an alteration of highly and poorly calcareous sands, locally silicified (Laga et al., 2001). Both techniques succeed in distinguishing between both aquifers and reveal the relationships between variables. The main advantage of PCA is the mathematical quantification of correlation between variables and the expression of the original data in the subspace defined by the principal components. The visualization of the SOM-analysis on the other hand allows a straightforward interpretation of the dataset structure in which even non-linear relationships between variables can be identified. Additionally, the SOM-algorithm can handle a limited amount of missing values in the dataset, contrary to PCA.

Research Center/Unit :

Aquapôle - ULiège

Disciplines :

Geological, petroleum & mining engineering

Author, co-author :

Peeters, Luk; Katholieke Universiteit Leuven - KUL > Geologie-Geografie > Hydrogeologie en Ingenieursgeologie

Dassargues, Alain ; Université de Liège - ULiège > Département Argenco : Secteur GEO3 > Hydrogéologie & Géologie de l'environnement

Language :

English

Title :

Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset

Publication date :

2006

Event name :

GeoENV2006

Event date :

2006

Audience :

International

Available on ORBi :

since 03 January 2009

Statistics

Number of views

1190 (9 by ULiège)

Number of downloads

1412 (13 by ULiège)

More statistics