References of "Sutera, Antonio"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailRandom Forests based group importance scores and their statistical interpretation: application for Alzheimer’s disease
Wehenkel, Marie ULiege; Sutera, Antonio ULiege; Bastin, Christine ULiege et al

in Frontiers in Neuroscience (2018), 12

Machine learning approaches have been increasingly used in the neuroimaging field for the design of computer-aided diagnosis systems. In this paper, we focus on the ability of these methods to provide ... [more ▼]

Machine learning approaches have been increasingly used in the neuroimaging field for the design of computer-aided diagnosis systems. In this paper, we focus on the ability of these methods to provide interpretable information about the brain regions that are the most informative about the disease or condition of interest. In particular, we investigate the benefit of group-based, instead of voxel-based, analyses in the context of Random forests. Assuming a prior division of the voxels into non overlapping groups (defined by an atlas), we propose several procedures to derive group importances from individual voxel importances derived from random forests models. We then adapt several permutation schemes to turn group importance scores into more interpretable statistical scores that allow to determine the truly relevant groups in the importance rankings. The good behavior of these methods is first assessed on artificial datasets. Then, they are applied on our own dataset of FDG-PET scans to identify the brain regions involved in the prognosis of Alzheimer's disease. [less ▲]

Detailed reference viewed: 44 (13 ULiège)
Full Text
Peer Reviewed
See detailPhase Identification of Smart Meters by Clustering Voltage Measurements
Olivier, Frédéric ULiege; Sutera, Antonio ULiege; Geurts, Pierre ULiege et al

in Proceedings of the XX Power Systems Computation Conference (PSCC 2018) (2018, June)

When a smart meter, be it single-phase or threephase, is connected to a three-phase network, the phase(s) to which it is connected is (are) initially not known. This means that each of its measurements is ... [more ▼]

When a smart meter, be it single-phase or threephase, is connected to a three-phase network, the phase(s) to which it is connected is (are) initially not known. This means that each of its measurements is not uniquely associated with a phase of the distribution network. This phase information is important because it can be used by Distribution System Operators to take actions in order to have a network that is more balanced. In this work, the correlation between the voltage measurements of the smart meters is used to identify the phases. To do so, the constrained k-means clustering method is first introduced as a reference, as it has been previously used for phase identification. A novel, automatic and effective method is then proposed to overcome the main drawback of the constrained k-means clustering, and improve the quality of the clustering. Indeed, it takes into account the underlying structure of the low-voltage distribution networks beneath the voltage measurements without a priori knowledge on the topology of the network. Both methods are analysed with real measurements from a distribution network in Belgium. The proposed algorithm shows superior performance in different settings, e.g. when the ratio of single-phase over three- phase meters in the network is high, when the period over which the voltages are averaged is longer than one minute, etc. [less ▲]

Detailed reference viewed: 866 (41 ULiège)
Full Text
Peer Reviewed
See detailRandom Subspace with Trees for Feature Selection Under Memory Constraints
Sutera, Antonio ULiege; Châtel, Célia; Louppe, Gilles ULiege et al

in Storkey, Amos; Perez-Cruz, Fernando (Eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (2018)

Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to ... [more ▼]

Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach. [less ▲]

Detailed reference viewed: 40 (14 ULiège)
Full Text
Peer Reviewed
See detailSimple connectome inference from partial correlation statistics in calcium imaging
Sutera, Antonio ULiege; Joly, Arnaud ULiege; François-Lavet, Vincent et al

in Soriano, Jordi; Battaglia, Demian; Guyon, Isabelle (Eds.) et al Neural Connectomics Challenge (2017)

In this work, we propose a simple yet effective solution to the problem of connectome inference in calcium imaging data. The proposed algorithm consists of two steps. First, processing the raw signals to ... [more ▼]

In this work, we propose a simple yet effective solution to the problem of connectome inference in calcium imaging data. The proposed algorithm consists of two steps. First, processing the raw signals to detect neural peak activities. Second, inferring the degree of association between neurons from partial correlation statistics. This paper summarises the methodology that led us to win the Connectomics Challenge, proposes a simplified version of our method, and finally compares our results with respect to other inference methods. [less ▲]

Detailed reference viewed: 147 (14 ULiège)
Full Text
Peer Reviewed
See detailRandom subspace with trees for feature selection under memory constraints
Sutera, Antonio ULiege; Châtel, Célia; Louppe, Gilles ULiege et al

Conference (2016, September 12)

Detailed reference viewed: 254 (29 ULiège)
Full Text
Peer Reviewed
See detailContext-dependent feature analysis with random forests
Sutera, Antonio ULiege; Louppe, Gilles ULiege; Huynh-Thu, Vân Anh ULiege et al

in Uncertainty In Artificial Intelligence: Proceedings of the Thirty-Two Conference (2016) (2016, June)

Detailed reference viewed: 156 (40 ULiège)
Full Text
Peer Reviewed
See detailDecision Making from Confidence Measurement on the Reward Growth using Supervised Learning: A Study Intended for Large-Scale Video Games
Taralla, David ULiege; Qiu, Zixiao ULiege; Sutera, Antonio ULiege et al

in Proceedings of the 8th International Conference on Agents and Artificial Intelligence (ICAART 2016) - Volume 2 (2016, February)

Video games have become more and more complex over the past decades. Today, players wander in visually and option- rich environments, and each choice they make, at any given time, can have a combinatorial ... [more ▼]

Video games have become more and more complex over the past decades. Today, players wander in visually and option- rich environments, and each choice they make, at any given time, can have a combinatorial number of consequences. However, modern artificial intelligence is still usually hard-coded, and as the game environments become increasingly complex, this hard-coding becomes exponentially difficult. Recent research works started to let video game autonomous agents learn instead of being taught, which makes them more intelligent. This contribution falls under this very perspective, as it aims to develop a framework for the generic design of autonomous agents for large-scale video games. We consider a class of games for which expert knowledge is available to define a state quality function that gives how close an agent is from its objective. The decision making policy is based on a confidence measurement on the growth of the state quality function, computed by a supervised learning classification model. Additionally, no stratagems aiming to reduce the action space are used. As a proof of concept, we tested this simple approach on the collectible card game Hearthstone and obtained encouraging results. [less ▲]

Detailed reference viewed: 531 (30 ULiège)
Full Text
Peer Reviewed
See detailSimple connectome inference from partial correlation statistics in calcium imaging
Sutera, Antonio ULiege; Joly, Arnaud ULiege; François-Lavet, Vincent ULiege et al

in Soriano, Jordi; Battaglia, Demian; Guyon, Isabelle (Eds.) et al Neural Connectomics Challenge (2014)

In this work, we propose a simple yet effective solution to the problem of connectome inference in calcium imaging data. The proposed algorithm consists of two steps. First, processing the raw signals to ... [more ▼]

In this work, we propose a simple yet effective solution to the problem of connectome inference in calcium imaging data. The proposed algorithm consists of two steps. First, processing the raw signals to detect neural peak activities. Second, inferring the degree of association between neurons from partial correlation statistics. This paper summarises the methodology that led us to win the Connectomics Challenge, proposes a simplified version of our method, and finally compares our results with respect to other inference methods. [less ▲]

Detailed reference viewed: 843 (179 ULiège)
Full Text
Peer Reviewed
See detailUnderstanding variable importances in forests of randomized trees
Louppe, Gilles ULiege; Wehenkel, Louis ULiege; Sutera, Antonio ULiege et al

in Advances in Neural Information Processing Systems 26 (2013, December)

Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work ... [more ▼]

Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees. [less ▲]

Detailed reference viewed: 1775 (209 ULiège)
Full Text
See detailCharacterization of variable importance measures derived from decision trees
Sutera, Antonio ULiege

Master's dissertation (2013)

In the context of machine learning, tree-based ensemble methods are common techniques used for prediction and explanation purposes in many research fields such as genetics for instance. These methods ... [more ▼]

In the context of machine learning, tree-based ensemble methods are common techniques used for prediction and explanation purposes in many research fields such as genetics for instance. These methods consist in building, by randomization, several decision trees and then aggregating their predictions. From an ensemble of trees, one can derive an importance score for each variable of the problem that assesses its relevance for predicting the output. Although these importance scores have been successfully exploited in many applications, they are not well understood and in particular, they lack a theoretical characterization. In this context, this work is a first step towards providing a better understanding of these measures from a theoretical and an empirical point of view. First, we derive, and verify empirically, an analytical formulation of the importance scores obtained from an ensemble of totally randomized trees in asymptotic conditions (i.e, infinite number of trees and infinite sample size). We then study empirically importance score distributions derived from totally randomized tree ensembles in non asymptotic conditions for several simple input-output models. In particular, we show theoretically and empirically the insensitivity of importance scores with respect to the introduction of irrelevant variables for these simple models. We then evaluate the effect of a reduction of the randomization on importance scores and their distribution. Finally, tree-based importance measures are illustrated on a digit recognition problem. [less ▲]

Detailed reference viewed: 143 (43 ULiège)