rapidfire-ai-inc
/

mistral-7b-sft-bnb-4bit

+---
+license: apache-2.0
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- causal-lm
+- text-generation
+- bitsandbytes
+- 4-bit
+- nf4
+- qlora
+- peft
+base_model: mistralai/Mistral-7B-v0.3
+---
+# rapidfire-ai-inc/mistral-7b-sft-bnb-4bit
+> 4-bit NF4 quantized Mistral 7B (v0.3) checkpoint prepared for QLoRA fine-tuning and fast inference.
+## TL;DR
+- **Base model:** `mistralai/Mistral-7B-v0.3`
+- **Quantization:** 4-bit **bitsandbytes** (NF4 + double quant; bfloat16 compute)
+- **Purpose:** Ready-to-use base for **QLoRA** fine-tuning; also suitable for lightweight inference
+- **Suggested dtype:** `torch.bfloat16` compute with 4-bit weights
+---
+## Quickstart (Transformers + bitsandbytes)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+import torch
+model_id = "rapidfire-ai-inc/mistral-7b-sft-bnb-4bit"
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    quantization_config=bnb_config,
+    torch_dtype=torch.bfloat16,
+)
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user",   "content": "Write a haiku about GPUs."}
+]
+prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tok(prompt, return_tensors="pt").to(model.device)
+out = model.generate(
+    **inputs,
+    max_new_tokens=128,
+    temperature=0.7,
+    top_p=0.9,
+)
+print(tok.decode(out[0], skip_special_tokens=True))
+```
+### BitsAndBytes (4-bit) config
+```python
+from transformers import BitsAndBytesConfig
+import torch
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+```
+---
+## Intended use & limitations
+**Use cases.** A compact, QLoRA-ready starting point for supervised fine-tuning (SFT) or preference tuning, plus low-memory inference.
+**Limitations.** Inherits all behaviors and restrictions of `mistralai/Mistral-7B-v0.3`. May produce inaccurate or biased content. Do not deploy in high‑risk settings without safeguards.
+**License.** This repository follows the **apache-2.0** terms and the upstream model’s license and acceptable‑use policies.
+---
+## Notes
+- Trained weights are unchanged aside from quantization; no additional fine‑tuning was performed.
+- Use `apply_chat_template` if the upstream tokenizer provides a chat template.
+- For best throughput on a single GPU, keep `torch_dtype=torch.bfloat16` and `load_in_4bit=True`.
+## Citation
+```bibtex
+@misc{rapidfireai_mistral_7b_sft_bnb_4bit_bnb4bit_2025,
+  title        = {mistral-7b-sft-bnb-4bit (RapidFire AI)},
+  author       = {RapidFire AI, Inc.},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/mistral-7b-sft-bnb-4bit}}
+}
+```