Configuration Parsing
Warning:
In config.json: "quantization_config.bits" must be an integer
Qwen3-VL-32B-Thinking-EXL3-3.5bpw
ExLlamaV3 quantization of Qwen/Qwen3-VL-32B-Thinking - A vision-language model with enhanced reasoning capabilities.
Quantization Details
| Parameter | Value |
|---|---|
| Bits per Weight | 3.5 bpw |
| Head Bits | 6 bpw |
| Calibration Rows | 128 |
| Calibration Context | 4096 tokens |
| Format | ExLlamaV3 (EXL3) |
| Size | ~17 GB |
Model Capabilities
- Vision + Reasoning: Process images with chain-of-thought analysis
- Thinking Mode:
<think>...</think>tags for complex visual reasoning - Context Window: 32K tokens
- Image Support: Single/multiple images, various resolutions
- Video Support: Frame-by-frame analysis
Hardware Requirements
| GPU | VRAM | Notes |
|---|---|---|
| RTX 4090 | 24 GB | Fits with moderate context + images |
| RTX 3090 | 24 GB | Works, may need lower context with large images |
| A100 40GB | 40 GB | Comfortable for all use cases |
Use Cases
- Screenshot Analysis: Understand UI, extract information
- Document OCR: Read and interpret documents with reasoning
- Visual Q&A: Answer questions about images with explanations
- Code from Screenshots: Analyze and explain code in images
Usage with TabbyAPI
# config.yml
model:
model_dir: models
model_name: Qwen3-VL-32B-Thinking-EXL3-3.5bpw
network:
host: 0.0.0.0
port: 5000
model_defaults:
max_seq_len: 16384
cache_mode: Q4
Recommended Settings
Visual Reasoning (detailed analysis):
- Temperature: 0.6
- Top-P: 0.95
- Enable thinking mode
Quick Visual Tasks (fast responses):
- Temperature: 0.7
- Top-P: 0.8
- Disable thinking mode
Original Model
This is a quantization of Qwen/Qwen3-VL-32B-Thinking. All credit for the base model goes to the Qwen team at Alibaba.
License
Apache 2.0 (inherited from base model)
- Downloads last month
- 159
Model tree for nullrunner/Qwen3-VL-32B-Thinking-EXL3-3.5bpw
Base model
Qwen/Qwen3-VL-32B-Thinking