ConfidentialMind
/

Mistral-Small-24B-Instruct-2501_GPTQ_G128_W4A16_MSE

Text Classification

confidentialmind

text-generation

mistral-small-24b

4-bit precision

Model card Files Files and versions

JustJaro commited on Feb 18

Commit

8033938

·

verified ·

1 Parent(s): 87e6612

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ base_model:
 - mistralai/Mistral-Small-24B-Instruct-2501
 pipeline_tag: text-classification
 ---
-# 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_gptq_g128_4bit 🔥
 This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
 It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
@@ -26,6 +26,7 @@ faster model with minimal performance degradation.
 Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
 *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
 ## Model Details
 - **Original Model:** [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)

 - mistralai/Mistral-Small-24B-Instruct-2501
 pipeline_tag: text-classification
 ---
+# 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_GPTQ_G128_W4A16_MSE 🔥
 This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
 It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
 Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
 *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
+*Note2* Due to the "packed" nature of mistral-small weights, MSE was used agressively along with a higher damping factor - this resulted in lesser loss and perplexity, however G32 is more advised
 ## Model Details
 - **Original Model:** [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)