Request for quantized version for 24GB VRAM (and also a gradio GUI demo script)
#4
by
mingyi456
- opened
It would be great if this model can squeeze into 24GB of VRAM with 8 bit quantization, either int8 or fp8. Also, it is much more convenient to have a python script that provides a gradio GUI, compared to running the code manually.
mingyi456
changed discussion title from
Request for quantized version for 24GB VRAM (also a gradio GUI demo script)
to Request for quantized version for 24GB VRAM (and also a gradio GUI demo script)
The quantized version and demo is on the way, please refer to our response to the github issue: https://github.com/inclusionAI/Ming/issues/12#issuecomment-2977029447