DECEMBER 2022 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 231 Method EsViT 80 Sup (ResNet) SwAV DINO 70 SimCLR-v2 MoCo-v2 CMC PIRL MoCO 60 LocalAgg BigBiGAN InstDisc Sup (ViT) DINO (ViT-B/16) SwAV-2X BYOL Barlow Twins VICReg SimSiam SimCLR MoCo-2X AMDIM MoCo-v3 (ViT-B) SimVLR-2X MAE CPC-v2 iBOT MoCo-v3 (ViT-L) BigBiGAN [116] MAE [58] Architecture PARAMS (M)Top-1 ACC R50 24 ViT-L/16 307 RelativePosition [62] R50w2X 94 Jigsaw [63] Rotation [49] Colorization [65] CPC-v1 [71] CPC-v2 [169] AMDIM [171] CMC [174] InstDisc [73] PIRL [74] MoCo [75] MoCo-2X SimCLR [76] SimCLR-2X MoCo-v2 [78] SimCLR-v2 [176] Rotation RelativePosition 50 CPC-v1 DeepCluster Jigsaw R50w2X 94 Rv50w4X 86 R101 R101 R161 28 28 305 R-custom 194 R50 R50 R50 R50 47 24 24 24 R50w2X 94 R50 24 R50w2X 94 R50 R50 MoCo-v3 (ViT-B) [77] ViT-B/16 MoCo-v3 (ViT-L) DeepCluster [51] LocalAgg [79] SwAV [81] ViT-L/16 VGG R50 R50 SwAV-2X Barlow Twins [87] VICReg [88] BYOL [83] 40 Colorization 50 SimSiam [84] DINO (RN) [52] DINO (ViT) EsViT [85] iBOT [86] 100 150 200 Number of Parameters (M) (a) 250 300 24 24 86 304 15 24 24 R50w2X 94 R50 R50 R50 R50 R50 ViT-B/16 Swin-B ViT-L/16 Supervised (ResNet) R50 Supervised (ViT) ViT-B/16 PARAMS: Parameters; ACC: Accuracy (b) FIGURE 18. (a) and (b) A comparison of SSL methods under the linear classification protocol on ImageNet [2]. All are reported as unsupervised pretraining on the ImageNet-1M training set followed by supervised linear classification trained on frozen features, evaluated on the validation set. The parameter counts are those of the feature extractors, which are commonly ResNet [17] or ViT [18]. RN: ResNet; VGG: Visual Geometry Group. 24 24 24 24 24 85 87 307 24 85 56.6 73.5 51.4 44.6 55.4 39.6 48.7 71.5 63.5 66.2 54 63.6 60.6 65.4 69.3 74.2 71.1 71.7 76.7 81 48.4 60.2 75.3 77.3 73.2 73.2 74.3 71.3 75.3 78.2 81.3 81.7 76.5 79.9 ImageNet Top-1 Accuracy (%)