Is there a plan for an FP8 version or a GGUF version that can be used in ComfyUI?
#1
by
pymo
- opened
Thank you to the author for the open-source spirit, respect. May I ask the author: Is there a plan for an FP8 version or a GGUF version that can be used in ComfyUI?
I asked AIs this and get different answers... in your experience, is f32 of 4b better than q8_0 of 8b? similarly, q4_K_M of the 32b vs 8b in f32? it is maybe stupid to ask, but I can't get a straight answer out of anyone...
If long context is the focus, choose the large model's Q4; if speed is prioritized, choose the small model's f32.
thanks!! that is actually helpful
Is there a plan for an FP8 version