Thank you to the InclusionAI team for this excellent open-source model—both the inference speed and the metrics are incredible. Can I run inference on an RTX 3090 with 24 GB of VRAM via vLLM?
Yes, you can.
· Sign up or log in to comment