GGUF seems to be broken

#10
by aportnoy - opened

I see the following in llama.cpp logs:

load: control-looking token:    106 '<end_of_turn>' was not control-type; this is probably a bug in the model. its type will be overridden
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect

Unlike with unofficial Q4_K_M quants, I also sometimes see <start_of_turn> at the end of model responses.

I can't even download it properly, I've tried various methods and keep getting a "file not found" error.

File can no longer be found. It has likely been moved or deleted.

File can no longer be found. It has likely been moved or deleted.

@TomDarkWay I still see it and I'm able to download, isn't it this one? https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/resolve/main/gemma-3-27b-it-q4_0.gguf

I didn't have the same issue as you, but had this:

~ ❯❯❯ ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_K_M                                                  (base) 
pulling manifest 
Error: pull model manifest: 400: {"error":"The specified tag is not available in the repository. Please use another tag or \"latest\""}

And as the error was actually to be expected: that was because of the broken copy snippet function when using Use this model-->Ollama--> Copy.
It should copy the snippet without altering it, but it turned out it was changing Q4_0 to Q4_K_M.

The snippet (correct):

ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_0  

The copied into clipboard (wrong):

ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_K_M

I don't know if this was relevant to your case @TomDarkWay , but that might at least help others having difficulty pulling it.

Also, just to let everyone know: the QAT versions are now available in ollama, sometimes they push updates but their model sorting is by "repo creation", not "last updated" so they are easily missed.

https://ollama.com/library/gemma3:27b-it-qat

And @pdevine also solved the mmproj issue, so vision works too

Ollama has a slightly different naming convention for the tensors/parameters since the two versions were developed independently of each other. It also uses a combined version of the vision tower + language model which is different than the way llama.cpp works. If you want this version to work properly with Ollama it will unfortunately have to be repacked correctly.

Google org

Hi @aportnoy , Could you please confirm if the issue has been resolved? Thank you.

Sign up or log in to comment