Upload folder using huggingface_hub
Browse files- README.md +193 -0
- chat_template.jinja +14 -0
- config.json +62 -0
- generation_config.json +8 -0
- model.safetensors +3 -0
- special_tokens_map.json +23 -0
- tokenizer.json +0 -0
- tokenizer_config.json +164 -0
README.md
ADDED
@@ -0,0 +1,193 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- unsloth
|
4 |
+
base_model:
|
5 |
+
- deepseek-ai/DeepSeek-Prover-V2-7B
|
6 |
+
---
|
7 |
+
<div>
|
8 |
+
<p style="margin-top: 0;margin-bottom: 0;">
|
9 |
+
<em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
|
10 |
+
</p>
|
11 |
+
<div style="display: flex; gap: 5px; align-items: center; ">
|
12 |
+
<a href="https://github.com/unslothai/unsloth/">
|
13 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
|
14 |
+
</a>
|
15 |
+
<a href="https://discord.gg/unsloth">
|
16 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
|
17 |
+
</a>
|
18 |
+
<a href="https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune">
|
19 |
+
<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
|
20 |
+
</a>
|
21 |
+
</div>
|
22 |
+
</div>
|
23 |
+
|
24 |
+
<!-- markdownlint-disable first-line-h1 -->
|
25 |
+
<!-- markdownlint-disable html -->
|
26 |
+
<!-- markdownlint-disable no-duplicate-header -->
|
27 |
+
|
28 |
+
<div align="center">
|
29 |
+
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
|
30 |
+
</div>
|
31 |
+
<hr>
|
32 |
+
<div align="center" style="line-height: 1;">
|
33 |
+
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
|
34 |
+
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
|
35 |
+
</a>
|
36 |
+
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
|
37 |
+
<img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
38 |
+
</a>
|
39 |
+
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
|
40 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
41 |
+
</a>
|
42 |
+
</div>
|
43 |
+
|
44 |
+
<div align="center" style="line-height: 1;">
|
45 |
+
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
|
46 |
+
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
|
47 |
+
</a>
|
48 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
|
49 |
+
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
50 |
+
</a>
|
51 |
+
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
|
52 |
+
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
53 |
+
</a>
|
54 |
+
</div>
|
55 |
+
|
56 |
+
<div align="center" style="line-height: 1;">
|
57 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-CODE" style="margin: 2px;">
|
58 |
+
<img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
|
59 |
+
</a>
|
60 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL" style="margin: 2px;">
|
61 |
+
<img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
|
62 |
+
</a>
|
63 |
+
</div>
|
64 |
+
|
65 |
+
## 1. Introduction
|
66 |
+
|
67 |
+
We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model.
|
68 |
+
|
69 |
+
<p align="center">
|
70 |
+
<img width="100%" src="https://github.com/deepseek-ai/DeepSeek-Prover-V2/blob/main/figures/performance.png?raw=true">
|
71 |
+
</p>
|
72 |
+
|
73 |
+
## 2. Model Summary
|
74 |
+
|
75 |
+
---
|
76 |
+
|
77 |
+
**Synthesize Cold-Start Reasoning Data through Recursive Proof Search**
|
78 |
+
|
79 |
+
- To construct the cold-start dataset, we develop a simple yet effective pipeline for recursive theorem proving, utilizing DeepSeek-V3 as a unified tool for both subgoal decomposition and formalization. We prompt DeepSeek-V3 to decompose theorems into high-level proof sketches while simultaneously formalizing these proof steps in Lean 4, resulting in a sequence of subgoals.
|
80 |
+
|
81 |
+
- We use a smaller 7B model to handle the proof search for each subgoal, thereby reducing the associated computational burden. Once the decomposed steps of a challenging problem are resolved, we pair the complete step-by-step formal proof with the corresponding chain-of-thought from DeepSeek-V3 to create cold-start reasoning data.
|
82 |
+
|
83 |
+
---
|
84 |
+
|
85 |
+
**Reinforcement Learning with Synthetic Cold-Start Data**
|
86 |
+
|
87 |
+
- We curate a subset of challenging problems that remain unsolved by the 7B prover model in an end-to-end manner, but for which all decomposed subgoals have been successfully resolved. By composing the proofs of all subgoals, we construct a complete formal proof for the original problem. This proof is then appended to DeepSeek-V3's chain-of-thought, which outlines the corresponding lemma decomposition, thereby producing a cohesive synthesis of informal reasoning and subsequent formalization.
|
88 |
+
|
89 |
+
- After fine-tuning the prover model on the synthetic cold-start data, we perform a reinforcement learning stage to further enhance its ability to bridge informal reasoning with formal proof construction. Following the standard training objective for reasoning models, we use binary correct-or-incorrect feedback as the primary form of reward supervision.
|
90 |
+
- The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching $88.9$% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. The proofs generated by DeepSeek-Prover-V2 for the miniF2F dataset are available for download as a [ZIP archive](https://github.com/deepseek-ai/DeepSeek-Prover-V2/blob/master/minif2f-solutions.zip).
|
91 |
+
|
92 |
+
---
|
93 |
+
|
94 |
+
## 3. ProverBench: Formalization of AIME and Textbook Problems
|
95 |
+
|
96 |
+
we introduce ProverBench, a benchmark dataset comprising 325 problems. Of these, 15 are formalized from number theory and algebra questions featured in the recent AIME competitions (AIME 24 and 25), offering authentic high-school competition-level challenges. The remaining 310 problems are drawn from curated textbook examples and educational tutorials, contributing a diverse and pedagogically grounded collection of formalized mathematical problems. This benchmark is designed to enable more comprehensive evaluation across both high-school competition problems and undergraduate-level mathematics.
|
97 |
+
|
98 |
+
<div align="center">
|
99 |
+
|
100 |
+
| Area | Count |
|
101 |
+
| :---------------------: | :-------: |
|
102 |
+
| AIME 24&25 | 15 |
|
103 |
+
| Number Theory | 40 |
|
104 |
+
| Elementary Algebra | 30 |
|
105 |
+
| Linear Algebra | 50 |
|
106 |
+
| Abstract Algebra | 40 |
|
107 |
+
| Calculus | 90 |
|
108 |
+
| Real Analysis | 30 |
|
109 |
+
| Complex Analysis | 10 |
|
110 |
+
| Functional Analysis | 10 |
|
111 |
+
| Probability | 10 |
|
112 |
+
| Total | 325 |
|
113 |
+
|
114 |
+
</div>
|
115 |
+
|
116 |
+
## 4. Model & Dataset Downloads
|
117 |
+
|
118 |
+
We release DeepSeek-Prover-V2 in two model sizes: 7B and 671B parameters. DeepSeek-Prover-V2-671B is trained on top of DeepSeek-V3-Base. DeepSeek-Prover-V2-7B is built upon DeepSeek-Prover-V1.5-Base and features an extended context length of up to 32K tokens.
|
119 |
+
|
120 |
+
<div align="center">
|
121 |
+
|
122 |
+
| **Model** | **Download** |
|
123 |
+
| :-----------------------------: | :----------------------------------------------------------: |
|
124 |
+
| DeepSeek-Prover-V2-7B | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-7B) |
|
125 |
+
| DeepSeek-Prover-V2-671B | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B) |
|
126 |
+
|
127 |
+
</div>
|
128 |
+
|
129 |
+
<div align="center">
|
130 |
+
|
131 |
+
| **Dataset** | **Download** |
|
132 |
+
| :-----------------------------: | :----------------------------------------------------------: |
|
133 |
+
| DeepSeek-ProverBench | [🤗 HuggingFace](https://huggingface.co/datasets/deepseek-ai/DeepSeek-ProverBench) |
|
134 |
+
|
135 |
+
</div>
|
136 |
+
|
137 |
+
## 5. Quick Start
|
138 |
+
|
139 |
+
You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference. DeepSeek-Prover-V2-671B shares the same architecture as DeepSeek-V3. For detailed information and supported features, please refer to [the DeepSeek-V3 documentation on Hugging Face](https://github.com/huggingface/transformers/blob/main/docs/source/en/model_doc/deepseek_v3.md).
|
140 |
+
|
141 |
+
The following is a basic example of generating a proof for a problem from the miniF2F dataset:
|
142 |
+
````python
|
143 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
144 |
+
import torch
|
145 |
+
torch.manual_seed(30)
|
146 |
+
|
147 |
+
model_id = "DeepSeek-Prover-V2-7B" # or DeepSeek-Prover-V2-671B
|
148 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
149 |
+
|
150 |
+
formal_statement = """
|
151 |
+
import Mathlib
|
152 |
+
import Aesop
|
153 |
+
|
154 |
+
set_option maxHeartbeats 0
|
155 |
+
|
156 |
+
open BigOperators Real Nat Topology Rat
|
157 |
+
|
158 |
+
/-- What is the positive difference between $120\%$ of 30 and $130\%$ of 20? Show that it is 10.-/
|
159 |
+
theorem mathd_algebra_10 : abs ((120 : ℝ) / 100 * 30 - 130 / 100 * 20) = 10 := by
|
160 |
+
sorry
|
161 |
+
""".strip()
|
162 |
+
|
163 |
+
prompt = """
|
164 |
+
Complete the following Lean 4 code:
|
165 |
+
|
166 |
+
```lean4
|
167 |
+
{}
|
168 |
+
```
|
169 |
+
|
170 |
+
Before producing the Lean 4 code to formally prove the given theorem, provide a detailed proof plan outlining the main proof steps and strategies.
|
171 |
+
The plan should highlight key ideas, intermediate lemmas, and proof structures that will guide the construction of the final formal proof.
|
172 |
+
""".strip()
|
173 |
+
|
174 |
+
chat = [
|
175 |
+
{"role": "user", "content": prompt.format(formal_statement)},
|
176 |
+
]
|
177 |
+
|
178 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
|
179 |
+
inputs = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
180 |
+
|
181 |
+
import time
|
182 |
+
start = time.time()
|
183 |
+
outputs = model.generate(inputs, max_new_tokens=8192)
|
184 |
+
print(tokenizer.batch_decode(outputs))
|
185 |
+
print(time.time() - start)
|
186 |
+
````
|
187 |
+
|
188 |
+
## 6. License
|
189 |
+
The use of DeepSeek-Prover-V2 models is subject to [the Model License](LICENSE-MODEL).
|
190 |
+
|
191 |
+
## 7. Contact
|
192 |
+
|
193 |
+
If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
|
chat_template.jinja
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '
|
2 |
+
|
3 |
+
' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + message['content'] + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '
|
4 |
+
' + '```json' + '
|
5 |
+
' + tool['function']['arguments'] + '
|
6 |
+
' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '
|
7 |
+
' + '```json' + '
|
8 |
+
' + tool['function']['arguments'] + '
|
9 |
+
' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'
|
10 |
+
' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '
|
11 |
+
' + '```json' + '
|
12 |
+
' + tool['function']['arguments'] + '
|
13 |
+
' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'
|
14 |
+
<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}
|
config.json
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"LlamaForCausalLM"
|
4 |
+
],
|
5 |
+
"attention_bias": false,
|
6 |
+
"attention_dropout": 0.0,
|
7 |
+
"bos_token_id": 100000,
|
8 |
+
"eos_token_id": 100001,
|
9 |
+
"head_dim": 128,
|
10 |
+
"hidden_act": "silu",
|
11 |
+
"hidden_size": 4096,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 11008,
|
14 |
+
"max_position_embeddings": 32768,
|
15 |
+
"mlp_bias": false,
|
16 |
+
"model_type": "llama",
|
17 |
+
"num_attention_heads": 32,
|
18 |
+
"num_hidden_layers": 30,
|
19 |
+
"num_key_value_heads": 32,
|
20 |
+
"pad_token_id": 100008,
|
21 |
+
"pretraining_tp": 1,
|
22 |
+
"quantization_config": {
|
23 |
+
"_load_in_4bit": true,
|
24 |
+
"_load_in_8bit": false,
|
25 |
+
"bnb_4bit_compute_dtype": "bfloat16",
|
26 |
+
"bnb_4bit_quant_storage": "uint8",
|
27 |
+
"bnb_4bit_quant_type": "nf4",
|
28 |
+
"bnb_4bit_use_double_quant": true,
|
29 |
+
"llm_int8_enable_fp32_cpu_offload": false,
|
30 |
+
"llm_int8_has_fp16_weight": false,
|
31 |
+
"llm_int8_skip_modules": [
|
32 |
+
"lm_head",
|
33 |
+
"multi_modal_projector",
|
34 |
+
"merger",
|
35 |
+
"modality_projection",
|
36 |
+
"model.layers.1.mlp",
|
37 |
+
"model.layers.1.self_attn.o_proj",
|
38 |
+
"model.layers.29.self_attn.o_proj"
|
39 |
+
],
|
40 |
+
"llm_int8_threshold": 6.0,
|
41 |
+
"load_in_4bit": true,
|
42 |
+
"load_in_8bit": false,
|
43 |
+
"quant_method": "bitsandbytes"
|
44 |
+
},
|
45 |
+
"rms_norm_eps": 1e-06,
|
46 |
+
"rope_scaling": {
|
47 |
+
"beta_fast": 32,
|
48 |
+
"beta_slow": 1,
|
49 |
+
"factor": 16,
|
50 |
+
"mscale": true,
|
51 |
+
"original_max_position_embeddings": 4096,
|
52 |
+
"rope_type": "yarn",
|
53 |
+
"type": "yarn"
|
54 |
+
},
|
55 |
+
"rope_theta": 10000,
|
56 |
+
"tie_word_embeddings": false,
|
57 |
+
"torch_dtype": "bfloat16",
|
58 |
+
"transformers_version": "4.52.3",
|
59 |
+
"unsloth_fixed": true,
|
60 |
+
"use_cache": true,
|
61 |
+
"vocab_size": 102400
|
62 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 100000,
|
4 |
+
"eos_token_id": 100001,
|
5 |
+
"max_length": 32768,
|
6 |
+
"pad_token_id": 100008,
|
7 |
+
"transformers_version": "4.52.3"
|
8 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:74a8df46f4baa3b469d0e3b7da15b534e3a386b27b5375f21dbc11f9173e3d12
|
3 |
+
size 5061159746
|
special_tokens_map.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<|begin▁of▁sentence|>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": true,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "<|end▁of▁sentence|>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": true,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": {
|
17 |
+
"content": "<|EOT|>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": true,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
}
|
23 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"add_prefix_space": null,
|
5 |
+
"added_tokens_decoder": {
|
6 |
+
"100000": {
|
7 |
+
"content": "<\uff5cbegin\u2581of\u2581sentence\uff5c>",
|
8 |
+
"lstrip": false,
|
9 |
+
"normalized": true,
|
10 |
+
"rstrip": false,
|
11 |
+
"single_word": false,
|
12 |
+
"special": true
|
13 |
+
},
|
14 |
+
"100001": {
|
15 |
+
"content": "<\uff5cend\u2581of\u2581sentence\uff5c>",
|
16 |
+
"lstrip": false,
|
17 |
+
"normalized": true,
|
18 |
+
"rstrip": false,
|
19 |
+
"single_word": false,
|
20 |
+
"special": true
|
21 |
+
},
|
22 |
+
"100002": {
|
23 |
+
"content": "<\uff5cfim\u2581hole\uff5c>",
|
24 |
+
"lstrip": false,
|
25 |
+
"normalized": true,
|
26 |
+
"rstrip": false,
|
27 |
+
"single_word": false,
|
28 |
+
"special": false
|
29 |
+
},
|
30 |
+
"100003": {
|
31 |
+
"content": "<\uff5cfim\u2581begin\uff5c>",
|
32 |
+
"lstrip": false,
|
33 |
+
"normalized": true,
|
34 |
+
"rstrip": false,
|
35 |
+
"single_word": false,
|
36 |
+
"special": false
|
37 |
+
},
|
38 |
+
"100004": {
|
39 |
+
"content": "<\uff5cfim\u2581end\uff5c>",
|
40 |
+
"lstrip": false,
|
41 |
+
"normalized": true,
|
42 |
+
"rstrip": false,
|
43 |
+
"single_word": false,
|
44 |
+
"special": false
|
45 |
+
},
|
46 |
+
"100005": {
|
47 |
+
"content": "<\uff5ccompletion\uff5c>",
|
48 |
+
"lstrip": false,
|
49 |
+
"normalized": true,
|
50 |
+
"rstrip": false,
|
51 |
+
"single_word": false,
|
52 |
+
"special": false
|
53 |
+
},
|
54 |
+
"100006": {
|
55 |
+
"content": "<\uff5cUser\uff5c>",
|
56 |
+
"lstrip": false,
|
57 |
+
"normalized": true,
|
58 |
+
"rstrip": false,
|
59 |
+
"single_word": false,
|
60 |
+
"special": false
|
61 |
+
},
|
62 |
+
"100007": {
|
63 |
+
"content": "<\uff5cAssistant\uff5c>",
|
64 |
+
"lstrip": false,
|
65 |
+
"normalized": true,
|
66 |
+
"rstrip": false,
|
67 |
+
"single_word": false,
|
68 |
+
"special": false
|
69 |
+
},
|
70 |
+
"100008": {
|
71 |
+
"content": "<|EOT|>",
|
72 |
+
"lstrip": false,
|
73 |
+
"normalized": true,
|
74 |
+
"rstrip": false,
|
75 |
+
"single_word": false,
|
76 |
+
"special": true
|
77 |
+
},
|
78 |
+
"100009": {
|
79 |
+
"content": "<\uff5ctool\u2581calls\u2581begin\uff5c>",
|
80 |
+
"lstrip": false,
|
81 |
+
"normalized": true,
|
82 |
+
"rstrip": false,
|
83 |
+
"single_word": false,
|
84 |
+
"special": false
|
85 |
+
},
|
86 |
+
"100010": {
|
87 |
+
"content": "<\uff5ctool\u2581calls\u2581end\uff5c>",
|
88 |
+
"lstrip": false,
|
89 |
+
"normalized": true,
|
90 |
+
"rstrip": false,
|
91 |
+
"single_word": false,
|
92 |
+
"special": false
|
93 |
+
},
|
94 |
+
"100011": {
|
95 |
+
"content": "<\uff5ctool\u2581call\u2581begin\uff5c>",
|
96 |
+
"lstrip": false,
|
97 |
+
"normalized": true,
|
98 |
+
"rstrip": false,
|
99 |
+
"single_word": false,
|
100 |
+
"special": false
|
101 |
+
},
|
102 |
+
"100012": {
|
103 |
+
"content": "<\uff5ctool\u2581call\u2581end\uff5c>",
|
104 |
+
"lstrip": false,
|
105 |
+
"normalized": true,
|
106 |
+
"rstrip": false,
|
107 |
+
"single_word": false,
|
108 |
+
"special": false
|
109 |
+
},
|
110 |
+
"100013": {
|
111 |
+
"content": "<\uff5ctool\u2581outputs\u2581begin\uff5c>",
|
112 |
+
"lstrip": false,
|
113 |
+
"normalized": true,
|
114 |
+
"rstrip": false,
|
115 |
+
"single_word": false,
|
116 |
+
"special": false
|
117 |
+
},
|
118 |
+
"100014": {
|
119 |
+
"content": "<\uff5ctool\u2581outputs\u2581end\uff5c>",
|
120 |
+
"lstrip": false,
|
121 |
+
"normalized": true,
|
122 |
+
"rstrip": false,
|
123 |
+
"single_word": false,
|
124 |
+
"special": false
|
125 |
+
},
|
126 |
+
"100015": {
|
127 |
+
"content": "<\uff5ctool\u2581output\u2581begin\uff5c>",
|
128 |
+
"lstrip": false,
|
129 |
+
"normalized": true,
|
130 |
+
"rstrip": false,
|
131 |
+
"single_word": false,
|
132 |
+
"special": false
|
133 |
+
},
|
134 |
+
"100016": {
|
135 |
+
"content": "<\uff5ctool\u2581output\u2581end\uff5c>",
|
136 |
+
"lstrip": false,
|
137 |
+
"normalized": true,
|
138 |
+
"rstrip": false,
|
139 |
+
"single_word": false,
|
140 |
+
"special": false
|
141 |
+
},
|
142 |
+
"100017": {
|
143 |
+
"content": "<\uff5ctool\u2581sep\uff5c>",
|
144 |
+
"lstrip": false,
|
145 |
+
"normalized": true,
|
146 |
+
"rstrip": false,
|
147 |
+
"single_word": false,
|
148 |
+
"special": false
|
149 |
+
}
|
150 |
+
},
|
151 |
+
"bos_token": "<\uff5cbegin\u2581of\u2581sentence\uff5c>",
|
152 |
+
"clean_up_tokenization_spaces": false,
|
153 |
+
"eos_token": "<\uff5cend\u2581of\u2581sentence\uff5c>",
|
154 |
+
"extra_special_tokens": {},
|
155 |
+
"legacy": true,
|
156 |
+
"model_max_length": 32768,
|
157 |
+
"pad_token": "<|EOT|>",
|
158 |
+
"padding_side": "left",
|
159 |
+
"sp_model_kwargs": {},
|
160 |
+
"tokenizer_class": "LlamaTokenizerFast",
|
161 |
+
"unk_token": null,
|
162 |
+
"use_default_system_prompt": false,
|
163 |
+
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<\uff5cUser\uff5c>' + message['content'] + '<\uff5cAssistant\uff5c>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<\uff5ctool\u2581outputs\u2581end\uff5c>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<\uff5ctool\u2581calls\u2581begin\uff5c><\uff5ctool\u2581call\u2581begin\uff5c>' + tool['type'] + '<\uff5ctool\u2581sep\uff5c>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<\uff5ctool\u2581call\u2581end\uff5c>'}}{%- else %}{{message['content'] + '<\uff5ctool\u2581calls\u2581begin\uff5c><\uff5ctool\u2581call\u2581begin\uff5c>' + tool['type'] + '<\uff5ctool\u2581sep\uff5c>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<\uff5ctool\u2581call\u2581end\uff5c>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<\uff5ctool\u2581call\u2581begin\uff5c>' + tool['type'] + '<\uff5ctool\u2581sep\uff5c>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<\uff5ctool\u2581call\u2581end\uff5c>'}}{%- endif %}{%- endfor %}{{'<\uff5ctool\u2581calls\u2581end\uff5c><\uff5cend\u2581of\u2581sentence\uff5c>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<\uff5ctool\u2581outputs\u2581end\uff5c>' + message['content'] + '<\uff5cend\u2581of\u2581sentence\uff5c>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<\uff5cend\u2581of\u2581sentence\uff5c>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<\uff5ctool\u2581outputs\u2581begin\uff5c><\uff5ctool\u2581output\u2581begin\uff5c>' + message['content'] + '<\uff5ctool\u2581output\u2581end\uff5c>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<\uff5ctool\u2581output\u2581begin\uff5c>' + message['content'] + '<\uff5ctool\u2581output\u2581end\uff5c>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<\uff5ctool\u2581outputs\u2581end\uff5c>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<\uff5cAssistant\uff5c>'}}{% endif %}"
|
164 |
+
}
|