25% smaller !?!

#1
by bobig - opened

Thanks for applying your special sauce to the pruned experts.

I hope it still works. The perplexity went up from 5-ish to 7-ish in the REAP, and you can't fiddle with experts anymore.

This model has enhanced attention paths customized for GLM, so it should make the model "happier" the same way it worked for Qwens

Would be interesting to measure up against your previous full 4.5Air quant with the same mixed precision formula (qx64) that you previously measured against qx5-hi and mxfp4.

Oh I already did, and it does great.
Running tests takes a long time and HF has some issues with connection limits, I can't properly upload(or even download) anything today. It seems that when you run a test, those calls also count, and clog the pipe, so to speak. And I am on a Pro account nevertheless. Weird

I tested LIMI-Air-qx54g-hi and got some weird numbers(hellaswag=0.698, piqa=0.781, winogrande=0.714, the others around 0.4-ish), need to test other quants too. Probably the qx64g-hi would work

Confirmed, does great. I like it!
Same speed now 15GB smaller with REAP.

A mini Claude Sonnet in a tight package that you "could" run on a Mac with 64GB.

Definitely my favorite REAP.

It is missing some data, of course, 25% of experts missing, won't win trivial persuit....use an MCP like Danielsig DuckDuckGo Search for LMstudio

It does seem to keep the brains, logic, reasoning, and the GLM style.

Sign up or log in to comment