Post
3392
YOLOv11 Complete On-device Study
- {NPU vs GPU vs CPU} Across All Model Variants
We've just completed comprehensive benchmarking of the entire YOLOv11 family on ZETIC.MLange.
Here's what every ML engineer needs to know.
π Key Findings Across 5 Model Variants (XL to Nano):
1. NPU Dominance in Efficiency:
- YOLOv11n: 1.72ms on NPU vs 53.60ms on CPU (31x faster)
- Memory footprint: 0-65MB across all variants
- Consistent sub-10ms inference even on XL models
2. The Sweet Spot - YOLOv11s:
- NPU: 3.23ms @ 95.57% mAP
- Perfect balance: 36MB model, production-ready speed
- 10x faster than GPU, 30x faster than CPU
3. Surprising Discovery:
Medium models (YOLOv11m) show unusual GPU performance patterns - NPU outperforms GPU by 4x (9.55ms vs 35.82ms), suggesting current GPU kernels aren't optimized for mid-size architectures.
4. Production Insights:
- XL/Large: GPU still competitive for batch processing
- Small/Nano: NPU absolutely crushes everything else
- Memory scaling: Linear from 10MB (Nano) to 217MB (XL)
- Accuracy plateau: 95.5-95.7% mAP across S/M/L variants
Real-world Impact:
For edge deployment, YOLOv11s on NPU delivers server-level accuracy at embedded speeds. This changes everything for real-time applications.
π Test these benchmarks yourself: https://mlange.zetic.ai/p/Steve/YOLOv11_comparison?tab=versions&version=5
π Full benchmark suite available now
The data speaks for itself.
NPUs aren't the future - they're the present for efficient inference.
Which variant fits your use case? Let's discuss in the comments.
- {NPU vs GPU vs CPU} Across All Model Variants
We've just completed comprehensive benchmarking of the entire YOLOv11 family on ZETIC.MLange.
Here's what every ML engineer needs to know.
π Key Findings Across 5 Model Variants (XL to Nano):
1. NPU Dominance in Efficiency:
- YOLOv11n: 1.72ms on NPU vs 53.60ms on CPU (31x faster)
- Memory footprint: 0-65MB across all variants
- Consistent sub-10ms inference even on XL models
2. The Sweet Spot - YOLOv11s:
- NPU: 3.23ms @ 95.57% mAP
- Perfect balance: 36MB model, production-ready speed
- 10x faster than GPU, 30x faster than CPU
3. Surprising Discovery:
Medium models (YOLOv11m) show unusual GPU performance patterns - NPU outperforms GPU by 4x (9.55ms vs 35.82ms), suggesting current GPU kernels aren't optimized for mid-size architectures.
4. Production Insights:
- XL/Large: GPU still competitive for batch processing
- Small/Nano: NPU absolutely crushes everything else
- Memory scaling: Linear from 10MB (Nano) to 217MB (XL)
- Accuracy plateau: 95.5-95.7% mAP across S/M/L variants
Real-world Impact:
For edge deployment, YOLOv11s on NPU delivers server-level accuracy at embedded speeds. This changes everything for real-time applications.
π Test these benchmarks yourself: https://mlange.zetic.ai/p/Steve/YOLOv11_comparison?tab=versions&version=5
π Full benchmark suite available now
The data speaks for itself.
NPUs aren't the future - they're the present for efficient inference.
Which variant fits your use case? Let's discuss in the comments.