Add the details of Hardware & Performance

#23
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -336,6 +336,28 @@ To achieve optimal performance, we recommend the following settings:
336
 
337
  4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.
338
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
339
  ### Citation
340
 
341
  If you find our work helpful, feel free to give us a cite.
 
336
 
337
  4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.
338
 
339
+
340
+ ## Hardware & Performance
341
+
342
+ Qwen3-8B is optimized for both research and production workloads:
343
+
344
+ - **Inference**:
345
+ - Runs efficiently on a single A100 (80GB) GPU or two A40s.
346
+ - Can be quantized to **INT8/FP8/4-bit** using `bitsandbytes`, `AutoGPTQ`, or `AWQ` for edge or consumer hardware (e.g., RTX 3090/4090).
347
+
348
+ - **Training / Fine-tuning**:
349
+ - Recommended: ≥ 2x A100 (80GB) or ≥ 4x A6000 GPUs.
350
+ - Supports **LoRA, QLoRA, and DPO/RLHF** fine-tuning approaches.
351
+ - Gradient checkpointing and FlashAttention v2 are enabled by default for memory efficiency.
352
+
353
+ | Mode | GPU Memory | Notes |
354
+ |----------|------------|-----------------------------|
355
+ | FP16 | ~45GB | Full precision inference |
356
+ | bfloat16 | ~38GB | Preferred for stability |
357
+ | 8-bit | ~22GB | Near-lossless quality |
358
+ | 4-bit | ~12GB | Higher speed, small quality drop |
359
+
360
+
361
  ### Citation
362
 
363
  If you find our work helpful, feel free to give us a cite.