shvit_s2 Fine-tuned on EuroSAT

This model is a fine-tuned version of SHViT (Single-Head Vision Transformer) on the EuroSAT dataset.

SHViT is from the CVPR 2024 paper: SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design by Seokju Yun and Youngmin Ro.

Model Description

  • Base Model: shvit_s2
  • Fine-tuned Dataset: EuroSAT
  • Number of Classes: 10
  • Input Resolution: 224x224
  • Framework: PyTorch / timm

Performance

  • Test Accuracy (100% data): 93.83%
  • Data Efficiency Score: 0.728
  • Data for 90% Performance: 55.0%

Dataset

EuroSAT: Satellite image classification (10 land use classes)

  • Classes: 10
  • Image Size: 64x64 โ†’ 224x224 (resized)

Training Details

This model was trained as part of a comprehensive analysis comparing SHViT with baseline models (DeiT-Tiny, MobileNetV2) across multiple dimensions:

  • Robustness to corruptions (noise, blur, weather effects)
  • Data efficiency across different training data fractions
  • Geometric invariance (rotation, crop, color changes)
  • Domain adaptation capabilities
  • Representation similarity analysis

Training configuration:

  • Optimizer: AdamW
  • Learning rate schedule: Cosine decay
  • Augmentation: RandAugment, Random Erasing
  • Input size: 224ร—224

Usage

import torch
from timm import create_model

# Load model
model = create_model('shvit_s2', num_classes=10, pretrained=False)

# Load checkpoint
checkpoint = torch.hub.load_state_dict_from_url(
    'hf://YOUR_USERNAME/shvit_s2-eurosat/checkpoint_99.pth'
)
model.load_state_dict(checkpoint['model'])
model.eval()

# Use for inference
# (your image preprocessing code here)

Or use with Hugging Face Hub:

from huggingface_hub import hf_hub_download
import torch

# Download checkpoint
checkpoint_path = hf_hub_download(
    repo_id="YOUR_USERNAME/shvit_s2-eurosat",
    filename="checkpoint_99.pth"
)

# Load model (requires timm and the SHViT model definition)
checkpoint = torch.load(checkpoint_path)
# ... load into your model

Analysis Repository

This model is part of a comprehensive analysis project. Full analysis code, scripts, and additional models available at:

  • GitHub: [Your GitHub Repository]
  • Paper/Report: [If available]

Analysis Scripts Include:

  • Learning curve analysis across data fractions
  • Robustness evaluation under various corruptions
  • Geometric invariance testing (rotation, crop, color)
  • Domain shift and transfer learning experiments
  • Representation similarity (CKA, CCA) analysis
  • Gradient-based saliency visualizations

Citation

If you use this model, please cite the original SHViT paper:

@inproceedings{yun2024shvit,
  author={Yun, Seokju and Ro, Youngmin},
  title={SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={5756--5767},
  year={2024}
}

And if you found this fine-tuned model or analysis useful:

@misc{shvit_s2_eurosat,
  author = {Your Name},
  title = {shvit_s2 Fine-tuned on EuroSAT},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/shvit_s2-eurosat}},
}

Collaborators

This work was done in collaboration with:

  • Vishal V
  • Priyal Garg

License

This model follows the MIT license. The original SHViT implementation is also under MIT license.

Acknowledgments

  • Original SHViT authors: Seokju Yun and Youngmin Ro
  • Built using timm
  • Trained models and analysis scripts available in our repository
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support