This is a mixed MLX-quantization of Qwen 3 235B-A22B based on the recipe for the ik_llama.cpp quant by ubergarm (https://huggingface.co/ubergarm/Qwen3-235B-A22B-GGUF) In my own experience, this quant performs better than a standard 4-bit MLX-quant with group-size 128.

I can run this quite comfortably on an M3 Max with 128GB RAM with full context (40k tokens) using "sudo sysctl iogpu.wired_limit_mb=121000"

Let me know what your experience is compared to other quants! :-)

Downloads last month: 127

Safetensors

Model size

235B params

Tensor type

BF16

U32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vlbosch/Qwen3-235B-A22B-MLX-mixed-4bit

Base model

Qwen/Qwen3-235B-A22B

Quantized

(41)

this model