Working paper (E-prints, working papers and research blog)
Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery
Poulet, Christophe; Debit, Ahmed; Josse, Claire et al.
2023
 

Files


Full Text
2023.03.29.534695v1.full-3.pdf
Author preprint (738.3 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
random forest; reproducibility; hyperstability; biomarker; signature
Abstract :
[en] Biomarker signature discovery remains the main path to develop clinical diagnostic tools when the biological knowledge on a pathology is weak. Shortest signatures are often preferred to reduce the cost of the diagnostic. The ability to find the best and shortest signature relies on the robustness of the models that can be built on such set of molecules. The classification algorithm that will be used is selected based on the average performance of its models, often expressed via the average AUC. However, it is not garanteed that an algorithm with a large AUC distribution will keep a stable performance when facing data. Here, we propose two AUC-derived hyper-stability scores, the HRS and the HSS, as complementary metrics to the average AUC, that should bring confidence in the choice for the best classification algorithm. To emphasize the importance of these scores, we compared 15 different Random Forests implementation. Additionally, the modelization time of each implementation was computed to further help deciding the best strategy. Our findings show that the Random Forest implementation should be chosen according to the data at hand and the classification question being evaluated. No Random Forest implementation can be used universally for any classification and on any dataset. Each of them should be tested for both their average AUC performance and AUC-derived stability, prior to analysis.Author summaryTo better measure the performance of a Machine Learning (ML) implementation, we introduce a new metric, the AUC hyper-stability, to be used in parallel with the average AUC. This AUC hyper-stability is able to discriminate ML implementations that show the same AUC performance. This metric can therefore help researchers in choosing the best ML method to get stable short predictive biomarker signatures. More specifically, we advocate a tradeoff between the average AUC performance, the hyper-stability scores, and the modeling time.
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Poulet, Christophe   ;  Centre Hospitalier Universitaire de Liège - CHU > > Service de rhumatologie
Debit, Ahmed   ;  Université de Liège - ULiège > GIGA > GIGA Cancer - Human Genetics
Josse, Claire  ;  Centre Hospitalier Universitaire de Liège - CHU > > Service d'oncologie médicale
Jerusalem, Guy  ;  Centre Hospitalier Universitaire de Liège - CHU > > Service d'oncologie médicale
Azencott, Chloe-Agathe 
Bours, Vincent  ;  Centre Hospitalier Universitaire de Liège - CHU > > Service de génétique
Van Steen, Kristel  ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Bioinformatique
 These authors have contributed equally to this work.
Language :
English
Title :
Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery
Publication date :
01 April 2023
Available on ORBi :
since 28 May 2023

Statistics


Number of views
31 (10 by ULiège)
Number of downloads
11 (4 by ULiège)

OpenCitations
 
0

Bibliography


Similar publications



Contact ORBi