runs in LMstudio

#1
by bobig - opened

No issues, use the latest Beta LMstudio 0.3.31

and latest runtime extension:
LM Studio MLX
v0.31.0

50 TPS on M4 max.
Seems pretty smart, testing.

Do you know how this 3bit-DWQ differs compared to mlx-community/MiniMax-M2-3bit?
Based on file size and looking at config.json quantization bits and group size they are identical!

Thanks for any info

Catalyst Security org

DWQ trains the quantization scales and biases to match the output of the full precision model. It generally preserves quality better than standard quantization which just rounds the weights. You can read more about DWQ here: https://github.com/ml-explore/mlx-lm/pull/129

@kernelpool thank you for the info! Looks promising indeed.

However the few evals tables showed in the PR was done only on tiny models (Qwen 1.7B & Llama 3.2 1B).
I wonder if there is any new reliable evals on much larger models.

Catalyst Security org

Yeah, running things like MMLU Pro is quite time consuming for these larger models, so I tend to just use perplexity testing to get a sense of the improvement (in addition to real worl testing).

Sign up or log in to comment