How about int8 quantization?
#3 opened 3 months ago
by
traphix
INT 8
#2 opened 3 months ago
by
freegheist

Slow inference on vLLM
3
#1 opened 3 months ago
by
hp1337