rapidfire-ai-inc/mistral-7b-sft-bnb-4bit

4-bit NF4 quantized Mistral 7B (v0.3) checkpoint prepared for QLoRA fine-tuning and fast inference.

TL;DR

  • Base model: mistralai/Mistral-7B-v0.3
  • Quantization: 4-bit bitsandbytes (NF4 + double quant; bfloat16 compute)
  • Purpose: Ready-to-use base for QLoRA fine-tuning; also suitable for lightweight inference
  • Suggested dtype: torch.bfloat16 compute with 4-bit weights

Quickstart (Transformers + bitsandbytes)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "rapidfire-ai-inc/mistral-7b-sft-bnb-4bit"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "Write a haiku about GPUs."}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.7,
    top_p=0.9,
)
print(tok.decode(out[0], skip_special_tokens=True))

BitsAndBytes (4-bit) config

from transformers import BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

Intended use & limitations

Use cases. A compact, QLoRA-ready starting point for supervised fine-tuning (SFT) or preference tuning, plus low-memory inference.

Limitations. Inherits all behaviors and restrictions of mistralai/Mistral-7B-v0.3. May produce inaccurate or biased content. Do not deploy in high鈥憆isk settings without safeguards.

License. This repository follows the apache-2.0 terms and the upstream model鈥檚 license and acceptable鈥憉se policies.


Notes

  • Trained weights are unchanged aside from quantization; no additional fine鈥憈uning was performed.
  • Use apply_chat_template if the upstream tokenizer provides a chat template.
  • For best throughput on a single GPU, keep torch_dtype=torch.bfloat16 and load_in_4bit=True.

Citation

@misc{rapidfireai_mistral_7b_sft_bnb_4bit_bnb4bit_2025,
  title        = {mistral-7b-sft-bnb-4bit (RapidFire AI)},
  author       = {RapidFire AI, Inc.},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/mistral-7b-sft-bnb-4bit}}
}
Downloads last month
103
Safetensors
Model size
7B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for rapidfire-ai-inc/mistral-7b-sft-bnb-4bit

Quantized
(73)
this model