runs in LMstudio

by bobig - opened 3 days ago

Discussion

bobig

3 days ago

•

edited 3 days ago

No issues, use the latest Beta LMstudio 0.3.31

and latest runtime extension:
LM Studio MLX
v0.31.0

bobig

3 days ago

50 TPS on M4 max.
Seems pretty smart, testing.

DaniDubi

2 days ago

Do you know how this 3bit-DWQ differs compared to mlx-community/MiniMax-M2-3bit?
Based on file size and looking at config.json quantization bits and group size they are identical!

Thanks for any info

kernelpool

Catalyst Security org 2 days ago

DWQ trains the quantization scales and biases to match the output of the full precision model. It generally preserves quality better than standard quantization which just rounds the weights. You can read more about DWQ here: https://github.com/ml-explore/mlx-lm/pull/129

DaniDubi

2 days ago

@kernelpool thank you for the info! Looks promising indeed.

However the few evals tables showed in the PR was done only on tiny models (Qwen 1.7B & Llama 3.2 1B).
I wonder if there is any new reliable evals on much larger models.

kernelpool

Catalyst Security org 1 day ago

Yeah, running things like MMLU Pro is quite time consuming for these larger models, so I tend to just use perplexity testing to get a sense of the improvement (in addition to real worl testing).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment