runs in LMstudio
No issues, use the latest Beta LMstudio 0.3.31
and latest runtime extension:
LM Studio MLX
v0.31.0
50 TPS on M4 max.
Seems pretty smart, testing.
Do you know how  this 3bit-DWQ differs compared to mlx-community/MiniMax-M2-3bit?
Based on file size and looking at config.json quantization bits and group size they are identical!
Thanks for any info
DWQ trains the quantization scales and biases to match the output of the full precision model. It generally preserves quality better than standard quantization which just rounds the weights. You can read more about DWQ here: https://github.com/ml-explore/mlx-lm/pull/129
@kernelpool thank you for the info! Looks promising indeed.
However the few evals tables showed in the PR was done only on tiny models (Qwen 1.7B & Llama 3.2 1B).
I wonder if there is any new reliable evals on much larger models.
Yeah, running things like MMLU Pro is quite time consuming for these larger models, so I tend to just use perplexity testing to get a sense of the improvement (in addition to real worl testing).