mmproj 是否可以量化到 Q4,对效果损失多大?

#2
by openmartin - opened

CPU 上跑还是很慢,一张图100秒以上

This comment has been hidden

There is no support for q4 when using llama.cpp to make projector file. q8 is the smallest.

100s is normal. There are a lot of tokens created from an image.

CPU 上跑还是很慢,一张图100秒以上

I find this frustrating with image model processing and using cpu. all models seem the same. Try Making the image smaller, use more cpu cores. Or if you need realtime a GPU. But you might as well use something other than llama.cpp if you have to use a GPU.

CPU 上跑还是很慢,一张图100秒以上

I find this frustrating with image model processing and using cpu. all models seem the same. Try Making the image smaller, use more cpu cores. Or if you need realtime a GPU. But you might as well use something other than llama.cpp if you have to use a GPU.

thanks,you are right.
I tried compile llama.cpp myself, add openblas and compile flag to optimize, it's slow though.

openmartin changed discussion status to closed

CPU 上跑还是很慢,一张图100秒以上

I find this frustrating with image model processing and using cpu. all models seem the same. Try Making the image smaller, use more cpu cores. Or if you need realtime a GPU. But you might as well use something other than llama.cpp if you have to use a GPU.

thanks,you are right.
I tried compile llama.cpp myself, add openblas and compile flag to optimize, it's slow though.

if you have a cpu with avx512 support that can give a 30% performance boost.

Sign up or log in to comment