RUNNING ON RTX 4090

by parthwagh - opened 1 day ago

1 day ago

Which graphics card you are using wikeeyang?
How much time it is taking to generate single image?
Can we run it on RTX 4090?

Is it possible for you to quantize this model even more?

wikeeyang

Owner 1 day ago

Hi, I quantized this model on 4090 48G card, NF4 model will slow than offical bf16 a little, about 12 minutes per image, bf16 about 10 minutes.
if you RTX 4090 24G machine can use share memory with GPU, you can try it, otherwise I think need 32G and above.
I try some block swap technical to inference, but don't fit this kindly of multimodal model, very slow...

wikeeyang

Owner 1 day ago

Looking foreward to offical DiDA version, can be about 20× faster inference without performance loss...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment