GGUF seems to be broken

#10

by aportnoy - opened Apr 22

Apr 22

I see the following in llama.cpp logs:

load: control-looking token:    106 '<end_of_turn>' was not control-type; this is probably a bug in the model. its type will be overridden
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect

Unlike with unofficial Q4_K_M quants, I also sometimes see <start_of_turn> at the end of model responses.

aportnoy

Apr 22

This unofficial Q4_0 doesn't have the issue: https://huggingface.co/bartowski/google_gemma-3-27b-it-qat-GGUF/blob/main/google_gemma-3-27b-it-qat-Q4_0.gguf.

TomDarkWay

May 5

I can't even download it properly, I've tried various methods and keep getting a "file not found" error.

TomDarkWay

May 5

File can no longer be found. It has likely been moved or deleted.

aportnoy

May 5

File can no longer be found. It has likely been moved or deleted.

@TomDarkWay I still see it and I'm able to download, isn't it this one? https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/resolve/main/gemma-3-27b-it-q4_0.gguf

owao

May 12

I didn't have the same issue as you, but had this:

~ ❯❯❯ ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_K_M                                                  (base) 
pulling manifest 
Error: pull model manifest: 400: {"error":"The specified tag is not available in the repository. Please use another tag or \"latest\""}

And as the error was actually to be expected: that was because of the broken copy snippet function when using Use this model-->Ollama--> Copy.
It should copy the snippet without altering it, but it turned out it was changing Q4_0 to Q4_K_M.

The snippet (correct):

ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_0

The copied into clipboard (wrong):

ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf:Q4_K_M

I don't know if this was relevant to your case @TomDarkWay , but that might at least help others having difficulty pulling it.

owao

May 12

•

edited May 12

Also, just to let everyone know: the QAT versions are now available in ollama, sometimes they push updates but their model sorting is by "repo creation", not "last updated" so they are easily missed.

https://ollama.com/library/gemma3:27b-it-qat

And @pdevine also solved the mmproj issue, so vision works too

ollama

May 12

Ollama has a slightly different naming convention for the tensors/parameters since the two versions were developed independently of each other. It also uses a combined version of the vision tower + language model which is different than the way llama.cpp works. If you want this version to work properly with Ollama it will unfortunately have to be repacked correctly.

Renu11

Google org 29 days ago

Hi @aportnoy , Could you please confirm if the issue has been resolved? Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment