Advances in Simulation-Based Inference: Towards the automation of the Scientific Method through Learning Algorithms

2022

thesis.pdf

Author postprint (5.51 MB)

All documents in ORBi are protected by a user license.

copy to clipboard copied

Keywords :

deep learning; machine learning; statistics; approximate inference; likelihood-free; simulation-based

Abstract :

[en] This dissertation presents several novel techniques and guidelines to advance the field of simulation-based inference. Simulation-based inference, or likelihood-free inference, refers to the process of statistical inference whenever simulating synthetic realizations x through detailed descriptions of their generating processes is possible, but evaluating the likelihood p(x | y) of parameters y tied to realizations x is intractable. What this effectively means is that while it is relatively simple to execute a computer simulation and collect samples from its generative process for various inputs y, it is rather difficult to invert the process where one poses the question: ``what set of parameters y could have been responsible producing x and what is their probability of doing that``
The likelihood p(x | y) plays a central role in answering this question. However, for most scientific simulators, the direct evaluation of the (true and unknown) likelihood involves solving an inverse problem that rests on the integration of all possible forward realizations implicitly defined by the computer code of the simulator. This issue is the core reason why it is typically impossible to evaluate the likelihood model of a computer simulator: it requires us to integrate across all possible code paths for all inputs y that could have potentially led to the realization x.
Classical statistical inference based on the likelihood is for this reason impractical. Nevertheless, approximate inference remains possible by relying on surrogates that produce estimates of key quantities necessary for statistical inference. This thesis introduces various techniques and guidelines to effectively construct such surrogates and demonstrates how these approximations should be applied reliably. We explicitly make the point that the dogma of data efficiency should not be central to the field. Rather, reliable approximations should if we ever are to deduce scientific results with the techniques we developed over the years. This point is strengthened by demonstrating that all techniques can produce approximations that are not reliable from a scientific point of view, that is, when one is interested in constraining parameters or models. We argue for novel protocols that provide theoretically backed reliability properties. To that end, this thesis introduces a novel algorithm that provides such guarantees in terms of the binary classifier. In fact, the theoretical result is applicable to any binary classification problem.
Finally, these contributions are framed within the context of the automation of science. This thesis concerned itself with the automation of the last step of the scientific method, which is described as a recurrence over the sequence hypothesis, experiment, and conclusion. For the most part, the steps of hypothesis formation and experiment design remain however solely for the scientists to decide. Only occasionally are they explored, designed and automated through computer-assisted means. For these two steps, we provide research avenues and proof of concepts that could unlock their automation.

Disciplines :

Computer science

Physics

Physics

Hermans, Joeri ^{}; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Language :

English

Title :

Advances in Simulation-Based Inference: Towards the automation of the Scientific Method through Learning Algorithms

Defense date :

2022

Institution :

ULiège - University of Liège [Faculty of Applied Sciences], Belgium

Degree :

Doctor of Philosophy in Computer Science

Promotor :

Louppe, Gilles ^{} ^{}; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Big Data

President :

Geurts, Pierre ^{}; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Jury member :

Tomczak, Jakub; VU - Vrije Universiteit Amsterdam

Wehenkel, Louis ^{} ^{}; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Weniger, Christophe; UvA - University of Amsterdam

Funders :

F.R.S.-FNRS - Fund for Scientific Research [BE]

Funding number :

FRIA 27575

Available on ORBi :

since 05 April 2022