e4m3fn vs e5m2 for FP8

#5
by NielsGx - opened

Anyone checked the quality difference between these two for Wan2.1 T2V/I2V ?

I know neither is better and it's all about compromises, but maybe one is slightly better in most generations ?

Owner

I've shared e5m2 mostly because it works with torch.compile on pre-4000 series GPUs, while e4m3fn does not.

@Kijai hi, so can i say there won't be much quality difference?

I've shared e5m2 mostly because it works with torch.compile on pre-4000 series GPUs, while e4m3fn does not.

how can I convert to e5m2?

Is e5m2 what wan calls "scaled" in the native repo?

Is e5m2 what wan calls "scaled" in the native repo?

No, the comfy scaled models for Wan are e4m3fn only. Difference to mine is how the scaling is calculated only.

Is e5m2 what wan calls "scaled" in the native repo?

No, the comfy scaled models for Wan are e4m3fn only. Difference to mine is how the scaling is calculated only.

Thank you very much, I will download and test your model then, I'm new to this and was looking for compatibility for comfy setup with teacache, triton, pytorch and sageagent, and its supposedly compatible with e5m2 models, I will test it.

Thank you!

I've shared e5m2 mostly because it works with torch.compile on pre-4000 series GPUs, while e4m3fn does not.

When running FP8 (e4m3fn or e5m2) + lightx2v, I always get ghosting, blending or whatever it's called, it look like hallucinations effect. Any idea why? It's kinda look like the image i attached.

I’ve tested the same workflow with identical settings and use your wrapper and native workflow, the only thing i did just switching the model between FP8 (e4m3fn/e5m2) and Q8. With Q8, the output looks clean and stable, but I’m unable to get similar quality when using FP8.

Do you know why this happens? I’d like to use FP8 for faster inference (since Q8 is slower), but i can't get it works.

image.png

I've shared e5m2 mostly because it works with torch.compile on pre-4000 series GPUs, while e4m3fn does not.

When running FP8 (e4m3fn or e5m2) + lightx2v, I always get ghosting, blending or whatever it's called, it look like hallucinations effect. Any idea why? It's kinda look like the image i attached.

I’ve tested the same workflow with identical settings and use your wrapper and native workflow, the only thing i did just switching the model between FP8 (e4m3fn/e5m2) and Q8. With Q8, the output looks clean and stable, but I’m unable to get similar quality when using FP8.

Do you know why this happens? I’d like to use FP8 for faster inference (since Q8 is slower), but i can't get it works.

image.png

Can't say that I've seen that happen for me, I've tested these scaled even against fp16 multiple times and there's never been any difference remotely that large.

I forgot to post these here, the initial test for T2V (the uploaded e5 is the v2 there):

Can't say that I've seen that happen for me, I've tested these scaled even against fp16 multiple times and there's never been any difference remotely that large.

Your comparison looks great, and the quality difference is barely noticeable!

I'm running i2v wan 2.2 with your scaled model btw, and I just remembered, I had switched the sampler from dpmpp to Euler while keeping the step count at 6, and that's when I got those hallucination effects.

I tested it again just now, FP8 with DPM++ at 6 steps and Euler at 12 steps, and the outputs look much better, pretty similar to what I get with Q8 at 6 steps using Euler.

Is that expected? Or should FP8 theoretically perform just as well as Q8 at 6 steps with Euler? For me, Q8 with Euler at 6 steps already looks good, but with FP8 I need to double the steps to get similar quality. So in the end, the inference time is basically the same 😅

And by "looks good," I mean the output doesn't have any of those hallucination issues like the one I showed earlier.

Can't say that I've seen that happen for me, I've tested these scaled even against fp16 multiple times and there's never been any difference remotely that large.

Your comparison looks great, and the quality difference is barely noticeable!

I'm running i2v wan 2.2 with your scaled model btw, and I just remembered, I had switched the sampler from dpmpp to Euler while keeping the step count at 6, and that's when I got those hallucination effects.

I tested it again just now, FP8 with DPM++ at 6 steps and Euler at 12 steps, and the outputs look much better, pretty similar to what I get with Q8 at 6 steps using Euler.

Is that expected? Or should FP8 theoretically perform just as well as Q8 at 6 steps with Euler? For me, Q8 with Euler at 6 steps already looks good, but with FP8 I need to double the steps to get similar quality. So in the end, the inference time is basically the same 😅

And by "looks good," I mean the output doesn't have any of those hallucination issues like the one I showed earlier.

It's really not expected to have such big differences between the weights, all my tests are with same params and only changing the model.

I have a question, do these fp8 scaled weights still convert back to FP32 while inferencing, or if GPU supports fp8 matrix mul, they are not scaled and use fp8 as is?

Sign up or log in to comment