Article (Scientific journals)
Illuminating The Path To Enhanced Resilience Of Machine Learning Models Against The Shadows Of Missing Labels.
Simankov, Nikolay; Tahzima, Rachid; Massart, Sébastien et al.
2025In IEEE Journal of Biomedical and Health Informatics, PP, p. 1-11
Peer Reviewed verified by ORBi
 

Files


Full Text
IEEE_JBHI___Illuminating_the_path_against_the_shadows_of_missing_labels.v4.pdf
Author preprint (616.61 kB) Creative Commons License - Attribution, Non-Commercial, ShareAlike
Download

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] The sensitivity of state-of-the-art supervised classification models is compromised by contamination-prone biomedical datasets, which are vulnerable to the presence of missing or erroneous labels (i.e., inliers). Starting from codon frequencies, electrocardiogram signals, biomarkers, morphological features, and patient questionnaires, we attempted to cover a wide range of typical biomedical databases exposed to the risk of missing data labeled as negative values (inlier contamination). In some very niche fields, such as image recognition, missing labels have received a lot of attention, but in biomedical and clinical research, where outliers are almost systematically filtered, inliers have remained orphans. Our study introduced a pragmatic and innovative automated methodology that consists of upcycling one-class semi-supervised anomaly detection (OCSSAD) models for filtering potential inliers in training datasets. Five OCSSAD and two ensemble methods were benchmarked on 6 databases with 10 different contamination levels and 10 random samples, achieving an average Matthews correlation coefficient (MCC) of 78±17% in validation, whereas 22 supervised classifiers achieved an average MCC score of 81±9% trained with the complete and uncontaminated trainset.Therefore, by filtering the training set with an isolation forest, the average resilience to inliers of 22 tested Machine Learning models increased from 69±11% to 95±1%, including neural networks and gradient-boosting methods. Taken together, our study showcased the efficacy of our versatile approach in enhancing the resilience of Machine Learning models and highlighted the importance of accurately addressing the inliers challenge in the domains of medical and Life Sciences.
Disciplines :
Computer science
Life sciences: Multidisciplinary, general & others
Human health sciences: Multidisciplinary, general & others
Author, co-author :
Simankov, Nikolay  ;  Université de Liège - ULiège > Département GxABT > Gestion durable des bio-agresseurs ; Université de Liège - ULiège > Département GxABT > Gestion durable des bio-agresseurs
Tahzima, Rachid  ;  Université de Liège - ULiège > Département GxABT > Gestion durable des bio-agresseurs
Massart, Sébastien  ;  Université de Liège - ULiège > Département GxABT > Gestion durable des bio-agresseurs
Soyeurt, Hélène  ;  Université de Liège - ULiège > Département GxABT > Modélisation et développement
Language :
English
Title :
Illuminating The Path To Enhanced Resilience Of Machine Learning Models Against The Shadows Of Missing Labels.
Publication date :
08 April 2025
Journal title :
IEEE Journal of Biomedical and Health Informatics
ISSN :
2168-2194
eISSN :
2168-2208
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), United States
Volume :
PP
Pages :
1-11
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Funding number :
FRIA grant No. FC 52719
Available on ORBi :
since 25 April 2025

Statistics


Number of views
73 (2 by ULiège)
Number of downloads
22 (0 by ULiège)

Scopus citations®
 
0
Scopus citations®
without self-citations
0
OpenCitations
 
0
OpenAlex citations
 
0

Bibliography


Similar publications



Contact ORBi