rapidfire-ai-inc/mistral-7b-sft-bnb-4bit
4-bit NF4 quantized Mistral 7B (v0.3) checkpoint prepared for QLoRA fine-tuning and fast inference.
TL;DR
- Base model:
mistralai/Mistral-7B-v0.3 - Quantization: 4-bit bitsandbytes (NF4 + double quant; bfloat16 compute)
- Purpose: Ready-to-use base for QLoRA fine-tuning; also suitable for lightweight inference
- Suggested dtype:
torch.bfloat16compute with 4-bit weights
Quickstart (Transformers + bitsandbytes)
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
model_id = "rapidfire-ai-inc/mistral-7b-sft-bnb-4bit"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about GPUs."}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=128,
temperature=0.7,
top_p=0.9,
)
print(tok.decode(out[0], skip_special_tokens=True))
BitsAndBytes (4-bit) config
from transformers import BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
Intended use & limitations
Use cases. A compact, QLoRA-ready starting point for supervised fine-tuning (SFT) or preference tuning, plus low-memory inference.
Limitations. Inherits all behaviors and restrictions of mistralai/Mistral-7B-v0.3. May produce inaccurate or biased content. Do not deploy in high鈥憆isk settings without safeguards.
License. This repository follows the apache-2.0 terms and the upstream model鈥檚 license and acceptable鈥憉se policies.
Notes
- Trained weights are unchanged aside from quantization; no additional fine鈥憈uning was performed.
- Use
apply_chat_templateif the upstream tokenizer provides a chat template. - For best throughput on a single GPU, keep
torch_dtype=torch.bfloat16andload_in_4bit=True.
Citation
@misc{rapidfireai_mistral_7b_sft_bnb_4bit_bnb4bit_2025,
title = {mistral-7b-sft-bnb-4bit (RapidFire AI)},
author = {RapidFire AI, Inc.},
year = {2025},
howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/mistral-7b-sft-bnb-4bit}}
}
- Downloads last month
- 103
Model tree for rapidfire-ai-inc/mistral-7b-sft-bnb-4bit
Base model
mistralai/Mistral-7B-v0.3