GGUF seems to be broken
I see the following in llama.cpp
logs:
load: control-looking token: 106 '<end_of_turn>' was not control-type; this is probably a bug in the model. its type will be overridden
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Unlike with unofficial Q4_K_M quants, I also sometimes see <start_of_turn>
at the end of model responses.
This unofficial Q4_0 doesn't have the issue: https://huggingface.co/bartowski/google_gemma-3-27b-it-qat-GGUF/blob/main/google_gemma-3-27b-it-qat-Q4_0.gguf.
I can't even download it properly, I've tried various methods and keep getting a "file not found" error.
File can no longer be found. It has likely been moved or deleted.
File can no longer be found. It has likely been moved or deleted.
@TomDarkWay I still see it and I'm able to download, isn't it this one? https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/resolve/main/gemma-3-27b-it-q4_0.gguf
I didn't have the same issue as you, but had this:
~ ❯❯❯ ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_K_M (base)
pulling manifest
Error: pull model manifest: 400: {"error":"The specified tag is not available in the repository. Please use another tag or \"latest\""}
And as the error was actually to be expected: that was because of the broken copy snippet function when using Use this model
-->Ollama
--> Copy
.
It should copy the snippet without altering it, but it turned out it was changing Q4_0
to Q4_K_M
.
The snippet (correct):
ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_0
The copied into clipboard (wrong):
ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_K_M
I don't know if this was relevant to your case @TomDarkWay , but that might at least help others having difficulty pulling it.
Also, just to let everyone know: the QAT versions are now available in ollama, sometimes they push updates but their model sorting is by "repo creation", not "last updated" so they are easily missed.
https://ollama.com/library/gemma3:27b-it-qat
And @pdevine also solved the mmproj issue, so vision works too
Ollama has a slightly different naming convention for the tensors/parameters since the two versions were developed independently of each other. It also uses a combined version of the vision tower + language model which is different than the way llama.cpp works. If you want this version to work properly with Ollama it will unfortunately have to be repacked correctly.