Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

220

Base only

Active filters: vLLM

mistralai/Mistral-Medium-3.5-128B

128B • Updated 27 days ago • 333k • 344

mistralai/Mistral-Small-4-119B-2603

119B • Updated Apr 27 • 52.6k • 380

mistralai/Mistral-Small-4-119B-2603-NVFP4

Updated Mar 17 • 1.15k • 90

QuantTrio/Qwen3.6-35B-A3B-AWQ

Image-Text-to-Text • 36B • Updated Apr 17 • 842k • 26

QuantTrio/GLM-5.1-AWQ

Text Generation • 754B • Updated Apr 21 • 1.53k • 8

QuantTrio/Qwen3.6-27B-AWQ

Image-Text-to-Text • 28B • Updated Apr 23 • 907k • 13

RecViking/Mistral-Medium-3.5-128B-NVFP4

74B • Updated 22 days ago • 13.8k • 7

unsloth/Mistral-Small-4-119B-2603-GGUF

119B • Updated Apr 20 • 10.9k • 70

selode-ai/Qwen-3.6-35B-A3B-VRAP-4-bit-AWQ-21.2GB

Image-Text-to-Text • 29B • Updated Apr 21 • 13.2k • 15

mistralai/Mistral-Medium-3.5-128B-EAGLE

Updated about 1 month ago • 518 • 40

bartowski/mistralai_Mistral-Medium-3.5-128B-GGUF

Image-Text-to-Text • 125B • Updated 27 days ago • 12.3k • 8

cyankiwi/Mistral-Medium-3.5-128B-AWQ-INT4

25B • Updated 26 days ago • 18.7k • 3

inferencerlabs/Mistral-Medium-3.5-MLX-9bit

Image-Text-to-Text • Updated 9 days ago • 1.16k • 1

model-scope/glm-4-9b-chat-GPTQ-Int4

Text Generation • 9B • Updated Jul 17, 2024 • 66 • 6

model-scope/glm-4-9b-chat-GPTQ-Int8

Text Generation • 9B • Updated Jul 23, 2024 • 7 • 2

tclf90/qwen2.5-72b-instruct-gptq-int4

Text Generation • 73B • Updated May 12, 2025 • 61 • 2

tclf90/qwen2.5-72b-instruct-gptq-int3

Text Generation • 69B • Updated May 12, 2025 • 61

prithivMLmods/Nu2-Lupi-Qwen-14B

Text Generation • 15B • Updated Mar 27, 2025 • 5 • 2

mradermacher/Nu2-Lupi-Qwen-14B-GGUF

15B • Updated Jul 11, 2025 • 182 • 1

mradermacher/Nu2-Lupi-Qwen-14B-i1-GGUF

15B • Updated Jul 11, 2025 • 513 • 1

JunHowie/Qwen3-0.6B-GPTQ-Int4

Text Generation • 0.6B • Updated Sep 3, 2025 • 210 • 1

JunHowie/Qwen3-0.6B-GPTQ-Int8

Text Generation • 0.6B • Updated Sep 3, 2025 • 9

JunHowie/Qwen3-1.7B-GPTQ-Int4

Text Generation • 2B • Updated Sep 3, 2025 • 2.6k • 1

JunHowie/Qwen3-1.7B-GPTQ-Int8

Text Generation • 2B • Updated Sep 3, 2025 • 16

JunHowie/Qwen3-32B-GPTQ-Int4

Text Generation • 33B • Updated Sep 5, 2025 • 26.3k • 4

JunHowie/Qwen3-32B-GPTQ-Int8

Text Generation • 33B • Updated Sep 5, 2025 • 389 • 4

JunHowie/Qwen3-30B-A3B-GPTQ-Int4

Text Generation • 5B • Updated Sep 6, 2025 • 21 • 1

JunHowie/Qwen3-14B-GPTQ-Int8

Text Generation • 15B • Updated Sep 5, 2025 • 91 • 1

JunHowie/Qwen3-14B-GPTQ-Int4

Text Generation • 15B • Updated Sep 5, 2025 • 87.5k • 4

JunHowie/Qwen3-8B-GPTQ-Int8

Text Generation • 8B • Updated Sep 4, 2025 • 697