Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
yeonseok-zeticaiΒ 
posted an update Sep 23
Post
3392
YOLOv11 Complete On-device Study
- {NPU vs GPU vs CPU} Across All Model Variants

We've just completed comprehensive benchmarking of the entire YOLOv11 family on ZETIC.MLange.
Here's what every ML engineer needs to know.

πŸ“Š Key Findings Across 5 Model Variants (XL to Nano):

1. NPU Dominance in Efficiency:
- YOLOv11n: 1.72ms on NPU vs 53.60ms on CPU (31x faster)
- Memory footprint: 0-65MB across all variants
- Consistent sub-10ms inference even on XL models

2. The Sweet Spot - YOLOv11s:
- NPU: 3.23ms @ 95.57% mAP
- Perfect balance: 36MB model, production-ready speed
- 10x faster than GPU, 30x faster than CPU

3. Surprising Discovery:
Medium models (YOLOv11m) show unusual GPU performance patterns - NPU outperforms GPU by 4x (9.55ms vs 35.82ms), suggesting current GPU kernels aren't optimized for mid-size architectures.

4. Production Insights:
- XL/Large: GPU still competitive for batch processing
- Small/Nano: NPU absolutely crushes everything else
- Memory scaling: Linear from 10MB (Nano) to 217MB (XL)
- Accuracy plateau: 95.5-95.7% mAP across S/M/L variants

Real-world Impact:
For edge deployment, YOLOv11s on NPU delivers server-level accuracy at embedded speeds. This changes everything for real-time applications.

πŸ”— Test these benchmarks yourself: https://mlange.zetic.ai/p/Steve/YOLOv11_comparison?tab=versions&version=5

πŸ“ˆ Full benchmark suite available now

The data speaks for itself.
NPUs aren't the future - they're the present for efficient inference.
Which variant fits your use case? Let's discuss in the comments.

Cool, we also depolyed yolo v12 and other generative ai model on NPU, try if you are interested. https://sdk.nexa.ai/model/YOLOv12%E2%80%91N

Β·

Thanks for sharing this! Great to see that you guys made this in PC & Laptop environment. I'll update YOLOv12 to mobile environment with ZETIC.MLange and let you know!.