Request for quantized version for 24GB VRAM (and also a gradio GUI demo script)

#4
by mingyi456 - opened

It would be great if this model can squeeze into 24GB of VRAM with 8 bit quantization, either int8 or fp8. Also, it is much more convenient to have a python script that provides a gradio GUI, compared to running the code manually.

mingyi456 changed discussion title from Request for quantized version for 24GB VRAM (also a gradio GUI demo script) to Request for quantized version for 24GB VRAM (and also a gradio GUI demo script)
inclusionAI org

The quantized version and demo is on the way, please refer to our response to the github issue: https://github.com/inclusionAI/Ming/issues/12#issuecomment-2977029447

Sign up or log in to comment