--- tags: - vision - image-classification - eurosat - shvit - vision-transformer library_name: timm license: mit --- # shvit_s2 Fine-tuned on EuroSAT This model is a fine-tuned version of **SHViT (Single-Head Vision Transformer)** on the **EuroSAT** dataset. SHViT is from the CVPR 2024 paper: [SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design](https://arxiv.org/abs/2401.16456) by Seokju Yun and Youngmin Ro. ## Model Description - **Base Model:** shvit_s2 - **Fine-tuned Dataset:** EuroSAT - **Number of Classes:** 10 - **Input Resolution:** 224x224 - **Framework:** PyTorch / timm ## Performance - **Test Accuracy (100% data):** 93.83% - **Data Efficiency Score:** 0.728 - **Data for 90% Performance:** 55.0% ## Dataset **EuroSAT**: Satellite image classification (10 land use classes) - **Classes:** 10 - **Image Size:** 64x64 → 224x224 (resized) ## Training Details This model was trained as part of a comprehensive analysis comparing SHViT with baseline models (DeiT-Tiny, MobileNetV2) across multiple dimensions: - **Robustness** to corruptions (noise, blur, weather effects) - **Data efficiency** across different training data fractions - **Geometric invariance** (rotation, crop, color changes) - **Domain adaptation** capabilities - **Representation similarity** analysis Training configuration: - Optimizer: AdamW - Learning rate schedule: Cosine decay - Augmentation: RandAugment, Random Erasing - Input size: 224×224 ## Usage ```python import torch from timm import create_model # Load model model = create_model('shvit_s2', num_classes=10, pretrained=False) # Load checkpoint checkpoint = torch.hub.load_state_dict_from_url( 'hf://YOUR_USERNAME/shvit_s2-eurosat/checkpoint_99.pth' ) model.load_state_dict(checkpoint['model']) model.eval() # Use for inference # (your image preprocessing code here) ``` Or use with Hugging Face Hub: ```python from huggingface_hub import hf_hub_download import torch # Download checkpoint checkpoint_path = hf_hub_download( repo_id="YOUR_USERNAME/shvit_s2-eurosat", filename="checkpoint_99.pth" ) # Load model (requires timm and the SHViT model definition) checkpoint = torch.load(checkpoint_path) # ... load into your model ``` ## Analysis Repository This model is part of a comprehensive analysis project. Full analysis code, scripts, and additional models available at: - **GitHub:** [Your GitHub Repository] - **Paper/Report:** [If available] ### Analysis Scripts Include: - Learning curve analysis across data fractions - Robustness evaluation under various corruptions - Geometric invariance testing (rotation, crop, color) - Domain shift and transfer learning experiments - Representation similarity (CKA, CCA) analysis - Gradient-based saliency visualizations ## Citation If you use this model, please cite the original SHViT paper: ```bibtex @inproceedings{yun2024shvit, author={Yun, Seokju and Ro, Youngmin}, title={SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, pages={5756--5767}, year={2024} } ``` And if you found this fine-tuned model or analysis useful: ```bibtex @misc{shvit_s2_eurosat, author = {Your Name}, title = {shvit_s2 Fine-tuned on EuroSAT}, year = {2024}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/YOUR_USERNAME/shvit_s2-eurosat}}, } ``` ## Collaborators This work was done in collaboration with: - Vishal V - Priyal Garg ## License This model follows the MIT license. The original SHViT implementation is also under MIT license. ## Acknowledgments - Original SHViT authors: Seokju Yun and Youngmin Ro - Built using [timm](https://github.com/rwightman/pytorch-image-models) - Trained models and analysis scripts available in our repository