No full text
Poster (Scientific congresses and symposiums)
Discussing the validation of high-dimensional probability distribution learning with mixtures of graphical models for inference
Schnitzler, François
2010Validation in Statistics and Machine Learning
 

Files


Full Text
No document available.

Send to



Details



Keywords :
bayesian networks; Markov Trees; Chow-Liu algorithm; mixture of trees
Abstract :
[en] Exact inference on probabilistic graphical models quickly becomes intractable when the dimension of the problem increases. A weighted average (or mixture) of different simple graphical models can be used instead of a more complicated model to learn a distribution, allowing probabilistic inference to be much more efficient. I hope to discuss issues related to the validation of algorithms for learning such mixtures of models and to high-dimensional learning of probabilistic graphical models in general, and to gather valuable feedback and comments on my approach. The main problems are the difficulties to assess the accuracy of the algorithms and to choose a representative set of target distributions. The accuracy of algorithms for learning probabilistic graphical models is often evaluated by comparing the structure of the resulting model to the target (e.g. Number of similar/dissimilar edges, score BDe etc). This approach however falls short when studying methods using a mixture of simple models : individually, these lack the representation power to model the true distribution, and only their combination allows them to compete with more sophisticated models. The Kullback-Leibler divergence is a measure of the difference between two probability densities, and can be used to compare any model learned from a dataset to the data generating distribution. For computational reasons, I however had to resort to a Monte Carlo estimation of this quantity for large problems (starting at around 200 variables). Since probabilistic inference is the ultimate motivation for building these models, and not probability modelling, a more meaningful measure of accuracy could be obtained by comparing mixtures against a combination of state of the art model learning and approximate inference algorithms. However, the exact inference result cannot be easily assessed for interesting target distributions, since the use of mixtures is precisely considered because exact inference is not possible on said targets, and approximate inference would introduce a bias. Selecting a target distribution used to generate the data sets on which the algorithms are evaluated also proved a challenge. The easiest solution was to generate them at random (although different approaches can be designed). These models are however likely to be rather different from real problems, and thus constitute a poor choice to assess the practical interest of mixture of models. Methods (e.g. linking multiple copies of a given network) have been developed to increase the size of models known by the community (e.g. the alarm network), and the obtained graphical models have been made available. These could however still be far from the kind of interactions present in a real setting. A better way to proceed could be to generate samples based on the equations describing a physical problem, to learn a probabilistic model as best as possible from this high-dimensional dataset, and to use it as target distribution.
Research center :
Systèmes et Modélisation
Disciplines :
Computer science
Author, co-author :
Schnitzler, François ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Discussing the validation of high-dimensional probability distribution learning with mixtures of graphical models for inference
Publication date :
06 October 2010
Number of pages :
A0
Event name :
Validation in Statistics and Machine Learning
Event organizer :
Nicole Krämer
Anne-Laure Boulesteix
Event place :
Berlin, Germany
Event date :
from 06-10-2010 to 07-10-2010
Audience :
International
Funders :
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture [BE]
Biomagnet IUAP network of the Belgian Science Policy Office
Pascal2 network of excellence of the EC
Available on ORBi :
since 25 January 2011

Statistics


Number of views
92 (4 by ULiège)
Number of downloads
0 (0 by ULiège)

Bibliography


Similar publications



Contact ORBi