Is there a plan for an FP8 version or a GGUF version that can be used in ComfyUI?

#1
by pymo - opened

Thank you to the author for the open-source spirit, respect. May I ask the author: Is there a plan for an FP8 version or a GGUF version that can be used in ComfyUI?

I asked AIs this and get different answers... in your experience, is f32 of 4b better than q8_0 of 8b? similarly, q4_K_M of the 32b vs 8b in f32? it is maybe stupid to ask, but I can't get a straight answer out of anyone...

If long context is the focus, choose the large model's Q4; if speed is prioritized, choose the small model's f32.

thanks!! that is actually helpful

Is there a plan for an FP8 version

Sign up or log in to comment