Eprint already available on another site (E-prints, working papers and research blog)
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Eymaël, Alexandre; Vandeghen, Renaud; Cioppa, Anthony et al.
2024
 

Files


Full Text
Eymael2024Efficient-arxiv.pdf
Publisher postprint (25.16 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Self-supervised learning; Masked autoencoders; Siamese networks; Video segmentation; Label propagation
Abstract :
[en] Self-supervised pre-training of image encoders is omnipresent in the literature, particularly following the introduction of Masked autoencoders (MAE). Current efforts attempt to learn object-centric representations from motion in videos. In particular, SiamMAE recently introduced a Siamese network, training a shared-weight encoder from two frames of a video with a high asymmetric masking ratio (95%). In this work, we propose CropMAE, an alternative approach to the Siamese pre-training introduced by SiamMAE. Our method specifically differs by exclusively considering pairs of cropped images sourced from the same image but cropped differently, deviating from the conventional pairs of frames extracted from a video. CropMAE therefore alleviates the need for video datasets, while maintaining competitive performances and drastically reducing pre-training time. Furthermore, we demonstrate that CropMAE learns similar object-centric representations without explicit motion, showing that current self-supervised learning methods do not learn objects from motion, but rather thanks to the Siamese architecture. Finally, CropMAE achieves the highest masking ratio to date (98.5%), enabling the reconstruction of images using only two visible patches. Our code is available at https://github.com/alexandre-eymael/CropMAE.
Disciplines :
Computer science
Author, co-author :
Eymaël, Alexandre  ;  University of Liège, Belgium
Vandeghen, Renaud  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Cioppa, Anthony  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science ; KAUST, Saudi Arabia
Giancola, Silvio;  KAUST, Saudi Arabia
Ghanem, Bernard;  KAUST, Saudi Arabia
Van Droogenbroeck, Marc  ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Télécommunications
 These authors have contributed equally to this work.
Language :
English
Title :
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Publication date :
27 March 2024
Tags :
CÉCI : Consortium des Équipements de Calcul Intensif
Tier-1 supercomputer
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Available on ORBi :
since 27 March 2024

Statistics


Number of views
69 (33 by ULiège)
Number of downloads
24 (8 by ULiège)

Bibliography


Similar publications



Contact ORBi