Abstract :
[en] We investigate several global variable importance measures derived from artificial neural networks (ANN) to address the challenging problem of feature ranking in high-dimensional unstructured problems. While several ANN (local) importance measures have been validated in the context of computer vision or natural language processing tasks, it is not clear how these methods perform on unstructured problems where many variables are expected to be irrelevant. We empirically compare these ANN measures with one standard and state-of-the-art Random forests (RF) importance measure on several artificial and real datasets. These experiments show that ANN measures can achieve performance similar to the RF measure, sometimes outperforming it. On some problems however, the feature rankings returned by ANN are not as good as the ones returned by RF, despite significantly better predictive performance. Importantly, reaching the best performance with the ANN-based methods often comes at the cost of introducing a so-called selection layer at the beginning of the network. Using this specific neural architecture has proven to be critical both in terms of feature ranking and predictive performance on datasets with many irrelevant variables. Finally, we evaluate these methods on the problem of gene network inference, where they yield decent performance, without however outperforming RF.
Scopus citations®
without self-citations
1