huihui-ai commited on
Commit
dc271a7
·
verified ·
1 Parent(s): 2841feb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +234 -0
README.md ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - ibm-granite/granite-4.0-micro
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - language
8
+ - granite-4.0
9
+ - abliterated
10
+ - uncensored
11
+ ---
12
+
13
+ # huihui-ai/Huihui-granite-4.0-micro-abliterated
14
+
15
+
16
+ This is an uncensored version of [ibm-granite/granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
17
+
18
+
19
+ ## Usage
20
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
21
+
22
+
23
+ ```python
24
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
25
+ import torch
26
+ import os
27
+ import signal
28
+ import random
29
+ import numpy as np
30
+ import time
31
+ from collections import Counter
32
+
33
+ cpu_count = os.cpu_count()
34
+ print(f"Number of CPU cores in the system: {cpu_count}")
35
+ half_cpu_count = cpu_count // 2
36
+ os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
37
+ os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
38
+ torch.set_num_threads(half_cpu_count)
39
+
40
+ print(f"PyTorch threads: {torch.get_num_threads()}")
41
+ print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
42
+ print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
43
+
44
+ # Load the model and tokenizer
45
+ NEW_MODEL_ID = "huihui-ai/Huihui-granite-4.0-micro-abliterated"
46
+ print(f"Load Model {NEW_MODEL_ID} ... ")
47
+ quant_config_4 = BitsAndBytesConfig(
48
+ load_in_4bit=True,
49
+ bnb_4bit_compute_dtype=torch.bfloat16,
50
+ bnb_4bit_use_double_quant=True,
51
+ llm_int8_enable_fp32_cpu_offload=True,
52
+ )
53
+
54
+ NUM_TRANS_LAYERS = 94
55
+
56
+ def create_device_map():
57
+ device_map = {
58
+ 'model.embed_tokens': 0,
59
+ 'model.norm': 0,
60
+ 'lm_head': 0
61
+ }
62
+ for start, end, gpu_id in [(0, 4, 0), (4, 22, 1), (22, 40, 2), (40, 58, 3), (58, 76, 4), (76, 94, 5)]:
63
+ for i in range(start, end):
64
+ device_map[f'model.layers.{i}'] = gpu_id
65
+
66
+ #for i in range(76, NUM_TRANS_LAYERS):
67
+ # device_map[f'model.layers.{i}'] = "cpu"
68
+
69
+ return device_map
70
+
71
+ #device_map = create_device_map()
72
+
73
+ model = AutoModelForCausalLM.from_pretrained(
74
+ NEW_MODEL_ID,
75
+ device_map="balanced",
76
+ trust_remote_code=True,
77
+ #quantization_config=quant_config_4,
78
+ torch_dtype=torch.bfloat16,
79
+ low_cpu_mem_usage=True,
80
+ )
81
+ #print(model)
82
+ #print(model.config)
83
+
84
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
85
+
86
+ messages = []
87
+ skip_prompt=True
88
+ skip_special_tokens=True
89
+
90
+ class CustomTextStreamer(TextStreamer):
91
+ def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
92
+ super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
93
+ self.generated_text = ""
94
+ self.stop_flag = False
95
+ self.init_time = time.time() # Record initialization time
96
+ self.end_time = None # To store end time
97
+ self.first_token_time = None # To store first token generation time
98
+ self.token_count = 0 # To track total tokens
99
+
100
+ def on_finalized_text(self, text: str, stream_end: bool = False):
101
+ if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
102
+ self.first_token_time = time.time()
103
+ self.generated_text += text
104
+ # Count tokens in the generated text
105
+ tokens = self.tokenizer.encode(text, add_special_tokens=False)
106
+ self.token_count += len(tokens)
107
+ print(text, end="", flush=True)
108
+ if stream_end:
109
+ self.end_time = time.time() # Record end time when streaming ends
110
+ if self.stop_flag:
111
+ raise StopIteration
112
+
113
+ def stop_generation(self):
114
+ self.stop_flag = True
115
+ self.end_time = time.time() # Record end time when generation is stopped
116
+
117
+ def get_metrics(self):
118
+ """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
119
+ if self.end_time is None:
120
+ self.end_time = time.time() # Set end time if not already set
121
+ total_time = self.end_time - self.init_time # Total time from init to end
122
+ tokens_per_second = self.token_count / total_time if total_time > 0 else 0
123
+ first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
124
+ metrics = {
125
+ "init_time": self.init_time,
126
+ "first_token_time": self.first_token_time,
127
+ "first_token_latency": first_token_latency,
128
+ "end_time": self.end_time,
129
+ "total_time": total_time, # Total time in seconds
130
+ "total_tokens": self.token_count,
131
+ "tokens_per_second": tokens_per_second
132
+ }
133
+ return metrics
134
+
135
+ def generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, max_new_tokens):
136
+ formatted_prompt = tokenizer.apply_chat_template(
137
+ messages,
138
+ tokenize=False,
139
+ add_generation_prompt=True,
140
+ )
141
+
142
+ toks = tokenizer(
143
+ formatted_prompt,
144
+ return_tensors="pt",
145
+ ).to(model.device)
146
+
147
+ streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
148
+
149
+ def signal_handler(sig, frame):
150
+ streamer.stop_generation()
151
+ print("\n[Generation stopped by user with Ctrl+C]")
152
+
153
+ signal.signal(signal.SIGINT, signal_handler)
154
+
155
+ generate_kwargs = {}
156
+
157
+ print("Response: ", end="", flush=True)
158
+ try:
159
+ generated_ids = model.generate(
160
+ **toks,
161
+ max_new_tokens=max_new_tokens,
162
+ streamer=streamer,
163
+ #**generate_kwargs
164
+ )
165
+ del generated_ids
166
+ except StopIteration:
167
+ print("\n[Stopped by user]")
168
+
169
+ del toks
170
+ torch.cuda.empty_cache()
171
+ signal.signal(signal.SIGINT, signal.SIG_DFL)
172
+
173
+ return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
174
+
175
+ while True:
176
+ print(f"skip_prompt: {skip_prompt}")
177
+ print(f"skip_special_tokens: {skip_special_tokens}")
178
+
179
+ user_input = input("User: ").strip()
180
+ if user_input.lower() == "/exit":
181
+ print("Exiting chat.")
182
+ break
183
+ if user_input.lower() == "/clear":
184
+ messages = []
185
+ print("Chat history cleared. Starting a new conversation.")
186
+ continue
187
+ if user_input.lower() == "/skip_prompt":
188
+ skip_prompt = not skip_prompt
189
+ continue
190
+ if user_input.lower() == "/skip_special_tokens":
191
+ skip_special_tokens = not skip_special_tokens
192
+ continue
193
+ if not user_input:
194
+ print("Input cannot be empty. Please enter something.")
195
+ continue
196
+
197
+
198
+ messages.append({"role": "user", "content": user_input})
199
+
200
+ response, stop_flag, metrics = generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, 40960)
201
+ print("\n\nMetrics:")
202
+ for key, value in metrics.items():
203
+ print(f" {key}: {value}")
204
+
205
+ print("", flush=True)
206
+ if stop_flag:
207
+ continue
208
+ messages.append({"role": "assistant", "content": response})
209
+
210
+ ```
211
+
212
+ ## Usage Warnings
213
+
214
+
215
+ - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
216
+
217
+ - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
218
+
219
+ - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
220
+
221
+ - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
222
+
223
+ - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
224
+
225
+ - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
226
+
227
+
228
+ ### Donation
229
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
230
+ - bitcoin:
231
+ ```
232
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
233
+ ```
234
+ - Support our work on Ko-fi (https://ko-fi.com/huihuiai)!