25% smaller !?!

by bobig - opened 22 days ago

Discussion

bobig

22 days ago

Thanks for applying your special sauce to the pruned experts.

nightmedia

Owner 22 days ago

I hope it still works. The perplexity went up from 5-ish to 7-ish in the REAP, and you can't fiddle with experts anymore.

This model has enhanced attention paths customized for GLM, so it should make the model "happier" the same way it worked for Qwens

jc2375

22 days ago

Would be interesting to measure up against your previous full 4.5Air quant with the same mixed precision formula (qx64) that you previously measured against qx5-hi and mxfp4.

nightmedia

Owner 22 days ago

Oh I already did, and it does great.
Running tests takes a long time and HF has some issues with connection limits, I can't properly upload(or even download) anything today. It seems that when you run a test, those calls also count, and clog the pipe, so to speak. And I am on a Pro account nevertheless. Weird

I tested LIMI-Air-qx54g-hi and got some weird numbers(hellaswag=0.698, piqa=0.781, winogrande=0.714, the others around 0.4-ish), need to test other quants too. Probably the qx64g-hi would work

bobig

21 days ago

Confirmed, does great. I like it!
Same speed now 15GB smaller with REAP.

A mini Claude Sonnet in a tight package that you "could" run on a Mac with 64GB.

Definitely my favorite REAP.

bobig

21 days ago

It is missing some data, of course, 25% of experts missing, won't win trivial persuit....use an MCP like Danielsig DuckDuckGo Search for LMstudio

It does seem to keep the brains, logic, reasoning, and the GLM style.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment