This is simply a patched version of the original Qwen-QwQ-32B model. (https://huggingface.co/Qwen/QwQ-32B)

Changed Functionality:

This version of the model will "remember thinking" as the conversation progresses. That is to say, all text contained between the <think> tags will be preserved in the context window and be utilized during all future response generations.

Pros: It can remember its previous thinking processes, thus potentially being better at unpacking complex concepts over a series of back-and-forth prompts.

Cons: The context window will fill much faster. And coherence will likely drop-off sooner.

Download GGUF:

https://huggingface.co/theouterspaced/Qwen_QwQ-32B-Q8_0_remember-thinking.gguf/resolve/main/Qwen_QwQ-32B-Q8_0%2Bremember-thinking.gguf

Downloads last month: 9

GGUF

Model size

33B params

Architecture

qwen2

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for theouterspaced/Qwen_QwQ-32B-Q8_0_remember-thinking.gguf

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/QwQ-32B

Quantized

(170)

this model