Qwen/Qwen2.5-Math-7B - GGUF
This repository contains GGUF quantizations of Qwen/Qwen2.5-Math-7B.
About GGUF
GGUF is a quantization method that allows you to run large language models on consumer hardware by reducing the precision of the model weights.
Files
Filename | Quant type | File Size | Description |
---|---|---|---|
model-f16.gguf | f16 | Large | Original precision |
model-q4_0.gguf | Q4_0 | Small | 4-bit quantization |
model-q4_1.gguf | Q4_1 | Small | 4-bit quantization (higher quality) |
model-q5_0.gguf | Q5_0 | Medium | 5-bit quantization |
model-q5_1.gguf | Q5_1 | Medium | 5-bit quantization (higher quality) |
model-q8_0.gguf | Q8_0 | Large | 8-bit quantization |
Usage
You can use these models with llama.cpp or any other GGUF-compatible inference engine.
llama.cpp
./llama-cli -m model-q4_0.gguf -p "Your prompt here"
Python (using llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="model-q4_0.gguf")
output = llm("Your prompt here", max_tokens=512)
print(output['choices'][0]['text'])
Original Model
This is a quantized version of Qwen/Qwen2.5-Math-7B. Please refer to the original model card for more information about the model's capabilities, training data, and usage guidelines.
Conversion Details
- Converted using llama.cpp
- Original model downloaded from Hugging Face
- Multiple quantization levels provided for different use cases
License
This model inherits the license from the original model. Please check the original model's license for usage terms.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support