---
tags:
- vision
- image-classification
- eurosat
- shvit
- vision-transformer
library_name: timm
license: mit
---

# shvit_s2 Fine-tuned on EuroSAT

This model is a fine-tuned version of **SHViT (Single-Head Vision Transformer)** on the **EuroSAT** dataset.

SHViT is from the CVPR 2024 paper: [SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design](https://arxiv.org/abs/2401.16456) by Seokju Yun and Youngmin Ro.

## Model Description

- **Base Model:** shvit_s2
- **Fine-tuned Dataset:** EuroSAT
- **Number of Classes:** 10
- **Input Resolution:** 224x224
- **Framework:** PyTorch / timm

## Performance

- **Test Accuracy (100% data):** 93.83%
- **Data Efficiency Score:** 0.728
- **Data for 90% Performance:** 55.0%


## Dataset

**EuroSAT**: Satellite image classification (10 land use classes)

- **Classes:** 10
- **Image Size:** 64x64 → 224x224 (resized)

## Training Details

This model was trained as part of a comprehensive analysis comparing SHViT with baseline models (DeiT-Tiny, MobileNetV2) across multiple dimensions:

- **Robustness** to corruptions (noise, blur, weather effects)
- **Data efficiency** across different training data fractions
- **Geometric invariance** (rotation, crop, color changes)
- **Domain adaptation** capabilities
- **Representation similarity** analysis

Training configuration:
- Optimizer: AdamW
- Learning rate schedule: Cosine decay
- Augmentation: RandAugment, Random Erasing
- Input size: 224×224

## Usage

```python
import torch
from timm import create_model

# Load model
model = create_model('shvit_s2', num_classes=10, pretrained=False)

# Load checkpoint
checkpoint = torch.hub.load_state_dict_from_url(
    'hf://YOUR_USERNAME/shvit_s2-eurosat/checkpoint_99.pth'
)
model.load_state_dict(checkpoint['model'])
model.eval()

# Use for inference
# (your image preprocessing code here)
```

Or use with Hugging Face Hub:

```python
from huggingface_hub import hf_hub_download
import torch

# Download checkpoint
checkpoint_path = hf_hub_download(
    repo_id="YOUR_USERNAME/shvit_s2-eurosat",
    filename="checkpoint_99.pth"
)

# Load model (requires timm and the SHViT model definition)
checkpoint = torch.load(checkpoint_path)
# ... load into your model
```

## Analysis Repository

This model is part of a comprehensive analysis project. Full analysis code, scripts, and additional models available at:
- **GitHub:** [Your GitHub Repository]
- **Paper/Report:** [If available]

### Analysis Scripts Include:
- Learning curve analysis across data fractions
- Robustness evaluation under various corruptions
- Geometric invariance testing (rotation, crop, color)
- Domain shift and transfer learning experiments
- Representation similarity (CKA, CCA) analysis
- Gradient-based saliency visualizations

## Citation

If you use this model, please cite the original SHViT paper:

```bibtex
@inproceedings{yun2024shvit,
  author={Yun, Seokju and Ro, Youngmin},
  title={SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={5756--5767},
  year={2024}
}
```

And if you found this fine-tuned model or analysis useful:

```bibtex
@misc{shvit_s2_eurosat,
  author = {Your Name},
  title = {shvit_s2 Fine-tuned on EuroSAT},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/shvit_s2-eurosat}},
}
```

## Collaborators

This work was done in collaboration with:
- Vishal V
- Priyal Garg

## License

This model follows the MIT license. The original SHViT implementation is also under MIT license.

## Acknowledgments

- Original SHViT authors: Seokju Yun and Youngmin Ro
- Built using [timm](https://github.com/rwightman/pytorch-image-models)
- Trained models and analysis scripts available in our repository