Liquid-Thinking-Preview - GGUF
This repository contains GGUF quantized versions of the Liquid-Thinking model, suitable for fast CPU-based inference.
These files were generated from the full-parameter fine-tuned model using the Unsloth library. For full details on the training process, dataset, and hyperparameters, please refer to the main model card.
Quantization Methods
The following quantizations are provided:
q4_k_m: A 4-bit quantization with K-quants, offering a good balance between model size, speed, and quality. Recommended for general use.q8_0: An 8-bit quantization that preserves higher quality at the cost of a larger file size. Use this if you have sufficient RAM and need maximum fidelity.
How to Use with llama.cpp
You can easily run these GGUF models using llama.cpp.
Clone and build
llama.cpp:git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp makeDownload a GGUF file: Choose a quantization and download it from the "Files and versions" tab of this repository. For example, to download the Q4_K_M version:
huggingface-cli download kreasof-ai/Liquid-Thinking-Preview-GGUF Liquid-Thinking-GGUF.Q4_K_M.gguf --local-dir . # Or with wget: # wget https://huggingface.co/kreasof-ai/Liquid-Thinking-Preview-GGUF/resolve/main/Liquid-Thinking-GGUF.Q4_K_M.ggufRun inference: This model uses the ChatML prompt template. You can use the
-iflag for an interactive session with the correct formatting../main -m Liquid-Thinking-GGUF.Q4_K_M.gguf \ --color \ -n 2048 \ -i \ --reverse-prompt 'user:' \ --in-prefix ' ' \ --in-suffix 'assistant:' \ -c 8192 \ --temp 0.3 \ --min-p 0.15 \ --repeat-penalty 1.05Alternatively, for a single prompt, use the
--chatmlflag which is designed for this template structure:./main -m Liquid-Thinking-GGUF.Q4_K_M.gguf --chatml -p "user: Explain the step-by-step process of photosynthesis in simple terms.\nassistant:"
Use with other tools (LM Studio, Ollama, etc.)
You can also use these GGUF files with any GUI-based tool that supports them:
- LM Studio: Search for this model from the Hugging Face integration within the app.
- Ollama: Create a
Modelfileto import and run the model locally.
Prompt Template
It is crucial to use the correct prompt template for the best results. This model expects the ChatML format:
<|im_start|>user
Your question or instruction here.<|im_end|>
<|im_start|>assistant
- Downloads last month
- 102
4-bit
8-bit