Discussing the validation of high-dimensional probability distribution learning with mixtures of graphical models for inference

Schnitzler, François

No full text

Poster (Scientific congresses and symposiums)

Discussing the validation of high-dimensional probability distribution learning with mixtures of graphical models for inference

Schnitzler, François

2010 • Validation in Statistics and Machine Learning

Permalink
https://hdl.handle.net/2268/82775

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

bayesian networks; Markov Trees; Chow-Liu algorithm; mixture of trees

Abstract :

[en] Exact inference on probabilistic graphical models quickly becomes intractable when the dimension of the problem increases. A weighted average (or mixture) of different simple graphical models can be used instead of a more complicated model to learn a distribution, allowing probabilistic inference to be much more efficient. I hope to discuss issues related to the validation of algorithms for learning such mixtures of models and to high-dimensional learning of probabilistic graphical models in general, and to gather valuable feedback and comments on my approach. The main problems are the difficulties to assess the accuracy of the algorithms and to choose a representative set of target distributions. The accuracy of algorithms for learning probabilistic graphical models is often evaluated by comparing the structure of the resulting model to the target (e.g. Number of similar/dissimilar edges, score BDe etc). This approach however falls short when studying methods using a mixture of simple models : individually, these lack the representation power to model the true distribution, and only their combination allows them to compete with more sophisticated models. The Kullback-Leibler divergence is a measure of the difference between two probability densities, and can be used to compare any model learned from a dataset to the data generating distribution. For computational reasons, I however had to resort to a Monte Carlo estimation of this quantity for large problems (starting at around 200 variables). Since probabilistic inference is the ultimate motivation for building these models, and not probability modelling, a more meaningful measure of accuracy could be obtained by comparing mixtures against a combination of state of the art model learning and approximate inference algorithms. However, the exact inference result cannot be easily assessed for interesting target distributions, since the use of mixtures is precisely considered because exact inference is not possible on said targets, and approximate inference would introduce a bias. Selecting a target distribution used to generate the data sets on which the algorithms are evaluated also proved a challenge. The easiest solution was to generate them at random (although different approaches can be designed). These models are however likely to be rather different from real problems, and thus constitute a poor choice to assess the practical interest of mixture of models. Methods (e.g. linking multiple copies of a given network) have been developed to increase the size of models known by the community (e.g. the alarm network), and the obtained graphical models have been made available. These could however still be far from the kind of interactions present in a real setting. A better way to proceed could be to generate samples based on the equations describing a physical problem, to learn a probabilistic model as best as possible from this high-dimensional dataset, and to use it as target distribution.

Research Center/Unit :

Systèmes et Modélisation

Disciplines :

Computer science

Author, co-author :

Schnitzler, François ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Discussing the validation of high-dimensional probability distribution learning with mixtures of graphical models for inference

Publication date :

06 October 2010

Number of pages :

Event name :

Validation in Statistics and Machine Learning

Event organizer :

Nicole Krämer
Anne-Laure Boulesteix

Event place :

Berlin, Germany

Event date :

from 06-10-2010 to 07-10-2010

Audience :

International

Funders :

FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture
Biomagnet IUAP network of the Belgian Science Policy Oﬃce
Pascal2 network of excellence of the EC

Available on ORBi :

since 25 January 2011

Statistics

Number of views

101 (4 by ULiège)

Number of downloads

0 (0 by ULiège)

More statistics