RUNNING ON RTX 4090

#1
by parthwagh - opened

Which graphics card you are using wikeeyang?
How much time it is taking to generate single image?
Can we run it on RTX 4090?

Is it possible for you to quantize this model even more?

Hi, I quantized this model on 4090 48G card, NF4 model will slow than offical bf16 a little, about 12 minutes per image, bf16 about 10 minutes.
if you RTX 4090 24G machine can use share memory with GPU, you can try it, otherwise I think need 32G and above.
I try some block swap technical to inference, but don't fit this kindly of multimodal model, very slow...

Looking foreward to offical DiDA version, can be about 20× faster inference without performance loss...

Sign up or log in to comment