Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Wightman, R.: Pytorch image models. GitHub (2019). https://github.com/huggingface/pytorch-image-models/blob/main/results/results-imagenet.csv
Ozbulak, U., et al.: Know your self-supervised learning: a survey on image-based generative and discriminative training. Trans. Mach. Learn. Res. (2023). https://openreview.net/forum?id=Ma25S4ludQ
Beyer, L., Hénaff, O., Kolesnikov, A., Zhai, X., Oord, A.: Are we done with ImageNet? arXiv preprint (2020). http://arxiv.org/abs/2006.07159
Tsipras, D., Santurkar, S., Engstrom, L., Ilyas, A., Madry, A.: From ImageNet to image classification: contextualizing progress on benchmarks. In: 37th International Conference on Machine Learning, Article no. 896, pp. 9625–9635 (2020). https://dl.acm.org/doi/10.5555/3524938.3525830
Vasudevan, V., Caine, B., Gontijo-Lopes, R., Fridovich-Keil, S., Roelofs, R.: When does dough become a bagel? Analyzing the remaining mistakes on ImageNet. In: NeurIPS (2022). https://openreview.net/pdf?id=mowt1WNhTC7
Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do ImageNet classifiers generalize to ImageNet? In: 36th International Conference on Machine Learning (2019). http://proceedings.mlr.press/v97/recht19a/recht19a.pdf
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Steinhardt, J., Madry, A.: Identifying statistical bias in dataset replication. In: 37th International Conference on Machine Learning (2020). http://proceedings.mlr.press/v119/engstrom20a/engstrom20a.pdf
Anzaku, E., Wang, H., Van Messem, A., De Neve, W.: A principled evaluation protocol for comparative investigation of the effectiveness of DNN classification models on similar-but-non-identical datasets. arXiv preprint (2022). http://arxiv. org/abs/2209.01848
Shankar, V., Roelofs, R., Mania, H., Fang, A., Recht, B., Schmidt, L.: Evaluating machine accuracy on ImageNet. In: 37th International Conference on Machine Learning, vol. 119, pp. 8634–8644 (2020). https://proceedings.mlr.press/v119/shankar20c.html
Northcutt, C., Athalye, A., Mueller, J.: Pervasive label errors in test sets destabilize machine learning benchmarks. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2021). https://openreview.net/pdf?id=XccDXrDNLek
Luccioni, A., Rolnick, D.: Bugs in the data: how imagenet misrepresents biodiversity. In: Proceedings of the Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, Article no. 1613, pp. 14382–14390 (2023). https://dl.acm. org/doi/10.1609/aaai.v37i12.26682