fisheye8k_jozhang97_deta-swin-large
This model is a fine-tuned version of jozhang97/deta-swin-large on the Fisheye8K dataset. It was developed as part of the Mcity Data Engine project, an open-source system designed for iterative model improvement through open-vocabulary data selection.
It achieves the following results on the evaluation set:
- Loss: 17.9701
Paper: Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection Project Page: Mcity Data Engine Docs Code: GitHub Repository
Model description
This model is a key component of the Mcity Data Engine, a comprehensive, open-source system for the complete data-based development cycle of machine learning models. It specifically targets challenges in Intelligent Transportation Systems (ITS), where the goal is to detect rare and novel classes in vast amounts of unlabeled data, such as those generated by vehicle fleets and roadside perception systems.
This fisheye8k_jozhang97_deta-swin-large model is an object detection model fine-tuned using the Mcity Data Engine's methodologies. It focuses on identifying specific object categories relevant to ITS, trained on data from fisheye cameras. The engine facilitates iterative model improvements by intelligently selecting and labeling data, especially for long-tail classes.
Intended uses & limitations
Intended Uses: This model is primarily intended for object detection tasks within Intelligent Transportation Systems (ITS). It is designed to identify objects such as Bus, Bike, Car, Pedestrian, and Truck in visual data, particularly from fisheye camera perspectives, as part of the iterative data selection and model training processes facilitated by the Mcity Data Engine. It serves as a practical demonstration and artifact of the engine's capabilities.
Limitations: As a model fine-tuned on a specific dataset (Fisheye8K), its performance may vary when applied to datasets with significantly different characteristics, environmental conditions, or object distributions. Its optimal utility is achieved when integrated within the broader Mcity Data Engine framework for continuous improvement and adaptation to novel classes.
Training and evaluation data
This model was fine-tuned on the Voxel51/fisheye8k dataset. This dataset is crucial for the model's application in Intelligent Transportation Systems, providing data from fisheye cameras. The training process leverages the open-vocabulary data selection capabilities of the Mcity Data Engine to identify and incorporate relevant samples, including rare and long-tail classes. The model detects the following classes: Bus, Bike, Car, Pedestrian, Truck.
Sample Usage
You can use this model directly with the Hugging Face transformers library for object detection:
import torch
from transformers import AutoImageProcessor, AutoModelForObjectDetection
from PIL import Image
import requests
# Load an example image (replace with your fisheye image if available)
# This example uses a standard COCO image for demonstration purposes.
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# Load image processor and model from the Hugging Face Hub
model_name = "jozhang97/fisheye8k_jozhang97_deta-swin-large"
image_processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForObjectDetection.from_pretrained(model_name)
# Process image and get predictions
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Post-process outputs to get bounding boxes, labels, and scores
target_sizes = torch.tensor([image.size[::-1]]) # (height, width) for post-processing
results = image_processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]
# Print detected objects
print("Detected objects:")
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
box = [round(i, 2) for i in box.tolist()]
print(
f" Detected {model.config.id2label[label.item()]} "
f"with confidence {round(score.item(), 3)} at location {box}"
)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 36
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 13.7551 | 1.0 | 5288 | 17.5573 |
| 12.6537 | 2.0 | 10576 | 17.4879 |
| 12.023 | 3.0 | 15864 | 17.6520 |
| 11.4167 | 4.0 | 21152 | 18.5138 |
| 10.8161 | 5.0 | 26440 | 17.7264 |
| 10.5346 | 6.0 | 31728 | 17.9145 |
| 10.1203 | 7.0 | 37016 | 17.9701 |
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
Acknowledgements
Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
Citation
If you use the Mcity Data Engine in your research, feel free to cite the project:
@article{bogdoll2025mcitydataengine,
title={Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection},
author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
journal={arXiv preprint arXiv:2504.21614},
year={2025}
}
- Downloads last month
- 18
Model tree for mcity-data-engine/fisheye8k_jozhang97_deta-swin-large
Base model
jozhang97/deta-swin-large