--- quantized_by: bobchenyx base_model: - zai-org/GLM-4.6 base_model_relation: quantized license: mit tags: - GLM - GLM-4.6 - transformers - GGUF pipeline_tag: text-generation --- ## Llamacpp Quantizations of zai-org/GLM-4.6 Adopting **BF16** & **Imatrix** from [unsloth/GLM-4.6-GGUF](https://huggingface.co/unsloth/GLM-4.6-GGUF). (Huge fan of unsloth) Personalized Replication of Low-Bit Mixed Precision Quant using `--tensor-type` option in [llama.cpp](https://github.com/ggml-org/llama.cpp) ``` - IQ1_M : 83.63 GiB (2.01 BPW) - Q2_K_L : 124.97 GiB (3.01 BPW) - Q4_K_L : 204.10 GiB (4.91 BPW) ``` ## Download Guide ``` # !pip install huggingface_hub hf_transfer import os os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1" from huggingface_hub import snapshot_download snapshot_download( repo_id = "bobchenyx/GLM-4.6-GGUF", local_dir = "bobchenyx/GLM-4.6-GGUF", allow_patterns = ["*IQ1_M*"], # Q2_K_L,Q4_K_L ) ``` ## Usage Guide ``` git clone https://github.com/ggml-org/llama.cpp.git cd llama.cpp cmake -B build -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF cmake --build build --config Release -j --clean-first ``` ``` build/bin/llama-cli -m bobchenyx/GLM-4.6-GGUF/GLM-4.6-IQ1_M/GLM-4.6-IQ1_M-00001-of-00003.gguf \ # PATH TO GGUF --jinja \ -ngl 99 \ --temp 1.0 --top-p 0.95 --top-k 40 --min-p 0.0 \ --ctx-size 16384 \ # 8192 ``` pass `/nothink` to disable thinking mode.