rapidfire-ai-inc/mistral-7b-sft-bnb-4bit

4-bit NF4 quantized Mistral 7B (v0.3) checkpoint prepared for QLoRA fine-tuning and fast inference.

TL;DR

Base model: mistralai/Mistral-7B-v0.3
Quantization: 4-bit bitsandbytes (NF4 + double quant; bfloat16 compute)
Purpose: Ready-to-use base for QLoRA fine-tuning; also suitable for lightweight inference
Suggested dtype: torch.bfloat16 compute with 4-bit weights

Quickstart (Transformers + bitsandbytes)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "rapidfire-ai-inc/mistral-7b-sft-bnb-4bit"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "Write a haiku about GPUs."}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.7,
    top_p=0.9,
)
print(tok.decode(out[0], skip_special_tokens=True))

BitsAndBytes (4-bit) config

from transformers import BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

Intended use & limitations

Use cases. A compact, QLoRA-ready starting point for supervised fine-tuning (SFT) or preference tuning, plus low-memory inference.

Limitations. Inherits all behaviors and restrictions of mistralai/Mistral-7B-v0.3. May produce inaccurate or biased content. Do not deploy in high‑risk settings without safeguards.

License. This repository follows the apache-2.0 terms and the upstream model’s license and acceptable‑use policies.

Notes

Trained weights are unchanged aside from quantization; no additional fine‑tuning was performed.
Use apply_chat_template if the upstream tokenizer provides a chat template.
For best throughput on a single GPU, keep torch_dtype=torch.bfloat16 and load_in_4bit=True.

Citation

@misc{rapidfireai_mistral_7b_sft_bnb_4bit_bnb4bit_2025,
  title        = {mistral-7b-sft-bnb-4bit (RapidFire AI)},
  author       = {RapidFire AI, Inc.},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/mistral-7b-sft-bnb-4bit}}
}

Downloads last month: 103

Safetensors

Model size

7B params

Tensor type

F32

BF16

Model tree for rapidfire-ai-inc/mistral-7b-sft-bnb-4bit

Base model

mistralai/Mistral-7B-v0.3

Quantized

(73)

this model