shvit_s2 Fine-tuned on EuroSAT

This model is a fine-tuned version of SHViT (Single-Head Vision Transformer) on the EuroSAT dataset.

SHViT is from the CVPR 2024 paper: SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design by Seokju Yun and Youngmin Ro.

Model Description

Base Model: shvit_s2
Fine-tuned Dataset: EuroSAT
Number of Classes: 10
Input Resolution: 224x224
Framework: PyTorch / timm

Performance

Test Accuracy (100% data): 93.83%
Data Efficiency Score: 0.728
Data for 90% Performance: 55.0%

Dataset

EuroSAT: Satellite image classification (10 land use classes)

Classes: 10
Image Size: 64x64 → 224x224 (resized)

Training Details

This model was trained as part of a comprehensive analysis comparing SHViT with baseline models (DeiT-Tiny, MobileNetV2) across multiple dimensions:

Robustness to corruptions (noise, blur, weather effects)
Data efficiency across different training data fractions
Geometric invariance (rotation, crop, color changes)
Domain adaptation capabilities
Representation similarity analysis

Training configuration:

Optimizer: AdamW
Learning rate schedule: Cosine decay
Augmentation: RandAugment, Random Erasing
Input size: 224×224

Usage

import torch
from timm import create_model

# Load model
model = create_model('shvit_s2', num_classes=10, pretrained=False)

# Load checkpoint
checkpoint = torch.hub.load_state_dict_from_url(
    'hf://YOUR_USERNAME/shvit_s2-eurosat/checkpoint_99.pth'
)
model.load_state_dict(checkpoint['model'])
model.eval()

# Use for inference
# (your image preprocessing code here)

Or use with Hugging Face Hub:

from huggingface_hub import hf_hub_download
import torch

# Download checkpoint
checkpoint_path = hf_hub_download(
    repo_id="YOUR_USERNAME/shvit_s2-eurosat",
    filename="checkpoint_99.pth"
)

# Load model (requires timm and the SHViT model definition)
checkpoint = torch.load(checkpoint_path)
# ... load into your model

Analysis Repository

This model is part of a comprehensive analysis project. Full analysis code, scripts, and additional models available at:

GitHub: [Your GitHub Repository]
Paper/Report: [If available]

Analysis Scripts Include:

Learning curve analysis across data fractions
Robustness evaluation under various corruptions
Geometric invariance testing (rotation, crop, color)
Domain shift and transfer learning experiments
Representation similarity (CKA, CCA) analysis
Gradient-based saliency visualizations

Citation

If you use this model, please cite the original SHViT paper:

@inproceedings{yun2024shvit,
  author={Yun, Seokju and Ro, Youngmin},
  title={SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={5756--5767},
  year={2024}
}

And if you found this fine-tuned model or analysis useful:

@misc{shvit_s2_eurosat,
  author = {Your Name},
  title = {shvit_s2 Fine-tuned on EuroSAT},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/shvit_s2-eurosat}},
}

Collaborators

This work was done in collaboration with:

Vishal V
Priyal Garg

License

This model follows the MIT license. The original SHViT implementation is also under MIT license.

Acknowledgments

Original SHViT authors: Seokju Yun and Youngmin Ro
Built using timm
Trained models and analysis scripts available in our repository

Downloads last month: -