About inference

#1
by AnnLi0507 - opened

Thank you to the InclusionAI team for this excellent open-source model—both the inference speed and the metrics are incredible. Can I run inference on an RTX 3090 with 24 GB of VRAM via vLLM?

Yes, you can.

Sign up or log in to comment