Add the details of Hardware & Performance
#23
by
Apurv09042005
- opened
Hardware & Performance
Qwen3-8B is optimized for both research and production workloads:
Inference:
Runs efficiently on a single A100 (80GB) GPU or two A40s.
Can be quantized to INT8/FP8/4-bit using bitsandbytes, AutoGPTQ, or AWQ for edge or consumer hardware (e.g., RTX 3090/4090).
Training / Fine-tuning:
Recommended: ≥ 2x A100 (80GB) or ≥ 4x A6000 GPUs.
Supports LoRA, QLoRA, and DPO/RLHF fine-tuning approaches.
Gradient checkpointing and FlashAttention v2 are enabled by default for memory efficiency.
Mode GPU Memory Notes
FP16 ~45GB Full precision inference
bfloat16 ~38GB Preferred for stability
8-bit ~22GB Near-lossless quality
4-bit ~12GB Trade-off: higher speed, small quality drop