kbigdelysh commited on
Commit
82a2061
·
verified ·
1 Parent(s): 4102ba6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - causal-lm
9
+ - text-generation
10
+ - bitsandbytes
11
+ - 4-bit
12
+ - nf4
13
+ - qlora
14
+ - peft
15
+ base_model: mistralai/Mistral-7B-v0.3
16
+ ---
17
+
18
+ # rapidfire-ai-inc/mistral-7b-sft-bnb-4bit
19
+
20
+ > 4-bit NF4 quantized Mistral 7B (v0.3) checkpoint prepared for QLoRA fine-tuning and fast inference.
21
+
22
+ ## TL;DR
23
+
24
+ - **Base model:** `mistralai/Mistral-7B-v0.3`
25
+ - **Quantization:** 4-bit **bitsandbytes** (NF4 + double quant; bfloat16 compute)
26
+ - **Purpose:** Ready-to-use base for **QLoRA** fine-tuning; also suitable for lightweight inference
27
+ - **Suggested dtype:** `torch.bfloat16` compute with 4-bit weights
28
+
29
+ ---
30
+
31
+ ## Quickstart (Transformers + bitsandbytes)
32
+
33
+ ```python
34
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
35
+ import torch
36
+
37
+ model_id = "rapidfire-ai-inc/mistral-7b-sft-bnb-4bit"
38
+
39
+ bnb_config = BitsAndBytesConfig(
40
+ load_in_4bit=True,
41
+ bnb_4bit_compute_dtype=torch.bfloat16,
42
+ bnb_4bit_use_double_quant=True,
43
+ bnb_4bit_quant_type="nf4",
44
+ )
45
+
46
+ tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
47
+ model = AutoModelForCausalLM.from_pretrained(
48
+ model_id,
49
+ device_map="auto",
50
+ quantization_config=bnb_config,
51
+ torch_dtype=torch.bfloat16,
52
+ )
53
+
54
+ messages = [
55
+ {"role": "system", "content": "You are a helpful assistant."},
56
+ {"role": "user", "content": "Write a haiku about GPUs."}
57
+ ]
58
+ prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
59
+
60
+ inputs = tok(prompt, return_tensors="pt").to(model.device)
61
+ out = model.generate(
62
+ **inputs,
63
+ max_new_tokens=128,
64
+ temperature=0.7,
65
+ top_p=0.9,
66
+ )
67
+ print(tok.decode(out[0], skip_special_tokens=True))
68
+ ```
69
+
70
+ ### BitsAndBytes (4-bit) config
71
+
72
+ ```python
73
+ from transformers import BitsAndBytesConfig
74
+ import torch
75
+
76
+ bnb_config = BitsAndBytesConfig(
77
+ load_in_4bit=True,
78
+ bnb_4bit_compute_dtype=torch.bfloat16,
79
+ bnb_4bit_use_double_quant=True,
80
+ bnb_4bit_quant_type="nf4",
81
+ )
82
+ ```
83
+
84
+ ---
85
+
86
+ ## Intended use & limitations
87
+
88
+ **Use cases.** A compact, QLoRA-ready starting point for supervised fine-tuning (SFT) or preference tuning, plus low-memory inference.
89
+
90
+ **Limitations.** Inherits all behaviors and restrictions of `mistralai/Mistral-7B-v0.3`. May produce inaccurate or biased content. Do not deploy in high‑risk settings without safeguards.
91
+
92
+ **License.** This repository follows the **apache-2.0** terms and the upstream model’s license and acceptable‑use policies.
93
+
94
+ ---
95
+
96
+ ## Notes
97
+
98
+ - Trained weights are unchanged aside from quantization; no additional fine‑tuning was performed.
99
+ - Use `apply_chat_template` if the upstream tokenizer provides a chat template.
100
+ - For best throughput on a single GPU, keep `torch_dtype=torch.bfloat16` and `load_in_4bit=True`.
101
+
102
+ ## Citation
103
+
104
+ ```bibtex
105
+ @misc{rapidfireai_mistral_7b_sft_bnb_4bit_bnb4bit_2025,
106
+ title = {mistral-7b-sft-bnb-4bit (RapidFire AI)},
107
+ author = {RapidFire AI, Inc.},
108
+ year = {2025},
109
+ howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/mistral-7b-sft-bnb-4bit}}
110
+ }
111
+ ```