Configuration Parsing Warning: In config.json: "quantization_config.bits" must be an integer

Qwen3-VL-32B-Thinking-EXL3-3.5bpw

ExLlamaV3 quantization of Qwen/Qwen3-VL-32B-Thinking - A vision-language model with enhanced reasoning capabilities.

Quantization Details

Parameter Value
Bits per Weight 3.5 bpw
Head Bits 6 bpw
Calibration Rows 128
Calibration Context 4096 tokens
Format ExLlamaV3 (EXL3)
Size ~17 GB

Model Capabilities

  • Vision + Reasoning: Process images with chain-of-thought analysis
  • Thinking Mode: <think>...</think> tags for complex visual reasoning
  • Context Window: 32K tokens
  • Image Support: Single/multiple images, various resolutions
  • Video Support: Frame-by-frame analysis

Hardware Requirements

GPU VRAM Notes
RTX 4090 24 GB Fits with moderate context + images
RTX 3090 24 GB Works, may need lower context with large images
A100 40GB 40 GB Comfortable for all use cases

Use Cases

  • Screenshot Analysis: Understand UI, extract information
  • Document OCR: Read and interpret documents with reasoning
  • Visual Q&A: Answer questions about images with explanations
  • Code from Screenshots: Analyze and explain code in images

Usage with TabbyAPI

# config.yml
model:
  model_dir: models
  model_name: Qwen3-VL-32B-Thinking-EXL3-3.5bpw

network:
  host: 0.0.0.0
  port: 5000

model_defaults:
  max_seq_len: 16384
  cache_mode: Q4

Recommended Settings

Visual Reasoning (detailed analysis):

  • Temperature: 0.6
  • Top-P: 0.95
  • Enable thinking mode

Quick Visual Tasks (fast responses):

  • Temperature: 0.7
  • Top-P: 0.8
  • Disable thinking mode

Original Model

This is a quantization of Qwen/Qwen3-VL-32B-Thinking. All credit for the base model goes to the Qwen team at Alibaba.

License

Apache 2.0 (inherited from base model)

Downloads last month
159
Safetensors
Model size
9B params
Tensor type
F16
I16
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for nullrunner/Qwen3-VL-32B-Thinking-EXL3-3.5bpw

Quantized
(25)
this model