mmproj 是否可以量化到 Q4,对效果损失多大?
CPU 上跑还是很慢,一张图100秒以上
There is no support for q4 when using llama.cpp to make projector file. q8 is the smallest.
100s is normal. There are a lot of tokens created from an image.
CPU 上跑还是很慢,一张图100秒以上
I find this frustrating with image model processing and using cpu. all models seem the same. Try Making the image smaller, use more cpu cores. Or if you need realtime a GPU. But you might as well use something other than llama.cpp if you have to use a GPU.
CPU 上跑还是很慢,一张图100秒以上
I find this frustrating with image model processing and using cpu. all models seem the same. Try Making the image smaller, use more cpu cores. Or if you need realtime a GPU. But you might as well use something other than llama.cpp if you have to use a GPU.
thanks,you are right.
I tried compile llama.cpp myself, add openblas and compile flag to optimize, it's slow though.
CPU 上跑还是很慢,一张图100秒以上
I find this frustrating with image model processing and using cpu. all models seem the same. Try Making the image smaller, use more cpu cores. Or if you need realtime a GPU. But you might as well use something other than llama.cpp if you have to use a GPU.
thanks,you are right.
I tried compile llama.cpp myself, add openblas and compile flag to optimize, it's slow though.
if you have a cpu with avx512 support that can give a 30% performance boost.