Article (Scientific journals)
Re-assessing accuracy degradation: a framework for understanding DNN behavior on similar-but-non-identical test datasets
Anzaku, Esla Timothy; Wang, Haohan; Babalola, Ajiboye et al.
2025In Machine Learning, 114 (3)
Peer Reviewed verified by ORBi
 

Files


Full Text
0038_2025 Anzaku.pdf
Author postprint (2.72 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] Abstract Deep Neural Networks (DNNs) often demonstrate remarkable performance when evaluated on the test dataset used during model creation. However, their ability to generalize effectively when deployed is crucial, especially in critical applications. One approach to assess the generalization capability of a DNN model is to evaluate its performance on replicated test datasets, which are created by closely following the same methodology and procedures used to generate the original test dataset. Our investigation focuses on the performance degradation of pre-trained DNN models in multi-class classification tasks when evaluated on these replicated datasets; this performance degradation has not been entirely explained by generalization shortcomings or dataset disparities. To address this, we introduce a new evaluation framework that leverages uncertainty estimates generated by the models studied. This framework is designed to isolate the impact of variations in the evaluated test datasets and assess DNNs based on the consistency of their confidence in their predictions. By employing this framework, we can determine whether an observed performance drop is primarily caused by model inadequacy or other factors. We applied our framework to analyze 564 pre-trained DNN models across the CIFAR-10 and ImageNet benchmarks, along with their replicated versions. Contrary to common assumptions about model inadequacy, our results indicate a substantial reduction in the performance gap between the original and replicated datasets when accounting for model uncertainty. This suggests a previously unrecognized adaptability of models to minor dataset variations. Our findings emphasize the importance of understanding dataset intricacies and adopting more nuanced evaluation methods when assessing DNN model performance. This research contributes to the development of more robust and reliable DNN models, especially in critical applications where generalization performance is of utmost importance. The code to reproduce our experiments will be available at <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://github.com/esla/Reassessing_DNN_Accuracy" ext-link-type="uri">https://github.com/esla/Reassessing_DNN_Accuracy</jats:ext-link>.
Disciplines :
Computer science
Author, co-author :
Anzaku, Esla Timothy
Wang, Haohan
Babalola, Ajiboye
Van Messem, Arnout  ;  Université de Liège - ULiège > Mathematics
De Neve, Wesley
Language :
English
Title :
Re-assessing accuracy degradation: a framework for understanding DNN behavior on similar-but-non-identical test datasets
Publication date :
14 February 2025
Journal title :
Machine Learning
ISSN :
0885-6125
eISSN :
1573-0565
Publisher :
Springer Science and Business Media LLC
Volume :
114
Issue :
3
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBi :
since 20 February 2025

Statistics


Number of views
16 (0 by ULiège)
Number of downloads
6 (0 by ULiège)

Scopus citations®
 
0
Scopus citations®
without self-citations
0
OpenCitations
 
0
OpenAlex citations
 
0

publications
0
supporting
0
mentioning
0
contrasting
0
Smart Citations
0
0
0
0
Citing PublicationsSupportingMentioningContrasting
View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Bibliography


Similar publications



Sorry the service is unavailable at the moment. Please try again later.
Contact ORBi