PEFT
Safetensors
91veMe4Plus commited on
Commit
ccfffe6
·
verified ·
1 Parent(s): 9accf9d

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. .DS_Store +0 -0
  2. README.md +202 -3
  3. adapter_config.json +39 -0
  4. added_tokens.json +35 -0
  5. chat_template.jinja +4 -0
  6. checkpoint-3000/README.md +202 -0
  7. checkpoint-3000/adapter_config.json +39 -0
  8. checkpoint-3000/added_tokens.json +35 -0
  9. checkpoint-3000/chat_template.jinja +4 -0
  10. checkpoint-3000/merges.txt +0 -0
  11. checkpoint-3000/special_tokens_map.json +86 -0
  12. checkpoint-3000/tokenizer.json +0 -0
  13. checkpoint-3000/tokenizer_config.json +501 -0
  14. checkpoint-3000/trainer_state.json +2884 -0
  15. checkpoint-3000/vocab.json +0 -0
  16. checkpoint-3200/README.md +202 -0
  17. checkpoint-3200/adapter_config.json +39 -0
  18. checkpoint-3200/added_tokens.json +35 -0
  19. checkpoint-3200/chat_template.jinja +4 -0
  20. checkpoint-3200/merges.txt +0 -0
  21. checkpoint-3200/rng_state.pth +3 -0
  22. checkpoint-3200/scaler.pt +3 -0
  23. checkpoint-3200/scheduler.pt +3 -0
  24. checkpoint-3200/special_tokens_map.json +86 -0
  25. checkpoint-3200/tokenizer.json +0 -0
  26. checkpoint-3200/tokenizer_config.json +501 -0
  27. checkpoint-3200/trainer_state.json +3074 -0
  28. checkpoint-3200/training_args.bin +3 -0
  29. checkpoint-3200/vocab.json +0 -0
  30. checkpoint-3375/README.md +202 -0
  31. checkpoint-3375/adapter_config.json +39 -0
  32. checkpoint-3375/adapter_model.safetensors +3 -0
  33. checkpoint-3375/added_tokens.json +35 -0
  34. checkpoint-3375/chat_template.jinja +4 -0
  35. checkpoint-3375/merges.txt +0 -0
  36. checkpoint-3375/rng_state.pth +3 -0
  37. checkpoint-3375/scaler.pt +3 -0
  38. checkpoint-3375/scheduler.pt +3 -0
  39. checkpoint-3375/special_tokens_map.json +86 -0
  40. checkpoint-3375/tokenizer.json +0 -0
  41. checkpoint-3375/tokenizer_config.json +501 -0
  42. checkpoint-3375/trainer_state.json +3227 -0
  43. checkpoint-3375/training_args.bin +3 -0
  44. checkpoint-3375/vocab.json +0 -0
  45. merges.txt +0 -0
  46. special_tokens_map.json +86 -0
  47. tokenizer.json +0 -0
  48. tokenizer_config.json +501 -0
  49. training_args.bin +3 -0
  50. vocab.json +0 -0
.DS_Store ADDED
Binary file (6.15 kB). View file
 
README.md CHANGED
@@ -1,3 +1,202 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "q_proj",
28
+ "v_proj",
29
+ "k_proj",
30
+ "up_proj",
31
+ "gate_proj",
32
+ "o_proj",
33
+ "down_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
chat_template.jinja ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
2
+ ' + message['content'] + '<|im_end|>' + '
3
+ '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
4
+ ' }}{% endif %}
checkpoint-3000/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-3000/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "q_proj",
28
+ "v_proj",
29
+ "k_proj",
30
+ "up_proj",
31
+ "gate_proj",
32
+ "o_proj",
33
+ "down_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
checkpoint-3000/added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
checkpoint-3000/chat_template.jinja ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
2
+ ' + message['content'] + '<|im_end|>' + '
3
+ '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
4
+ ' }}{% endif %}
checkpoint-3000/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3000/special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
checkpoint-3000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3000/tokenizer_config.json ADDED
@@ -0,0 +1,501 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "clean_up_tokenization_spaces": true,
495
+ "eos_token": "<|endofturn|>",
496
+ "extra_special_tokens": {},
497
+ "model_max_length": 1000000000000000019884624838656,
498
+ "pad_token": "<|endoftext|>",
499
+ "tokenizer_class": "GPT2Tokenizer",
500
+ "unk_token": "<|endoftext|>"
501
+ }
checkpoint-3000/trainer_state.json ADDED
@@ -0,0 +1,2884 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 2200,
3
+ "best_metric": 1.8803235292434692,
4
+ "best_model_checkpoint": "/content/drive/MyDrive/hyperclova-deobfuscation-lora/checkpoint-2200",
5
+ "epoch": 2.6666666666666665,
6
+ "eval_steps": 200,
7
+ "global_step": 3000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.008888888888888889,
14
+ "grad_norm": 3.629798412322998,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 4.1483,
17
+ "mean_token_accuracy": 0.34797456339001653,
18
+ "num_tokens": 11242.0,
19
+ "step": 10
20
+ },
21
+ {
22
+ "epoch": 0.017777777777777778,
23
+ "grad_norm": 2.6125221252441406,
24
+ "learning_rate": 3.8e-05,
25
+ "loss": 3.7515,
26
+ "mean_token_accuracy": 0.4058148756623268,
27
+ "num_tokens": 22106.0,
28
+ "step": 20
29
+ },
30
+ {
31
+ "epoch": 0.02666666666666667,
32
+ "grad_norm": 2.9313137531280518,
33
+ "learning_rate": 5.8e-05,
34
+ "loss": 3.3279,
35
+ "mean_token_accuracy": 0.4703808955848217,
36
+ "num_tokens": 33774.0,
37
+ "step": 30
38
+ },
39
+ {
40
+ "epoch": 0.035555555555555556,
41
+ "grad_norm": 2.0496416091918945,
42
+ "learning_rate": 7.800000000000001e-05,
43
+ "loss": 2.9114,
44
+ "mean_token_accuracy": 0.5239812344312668,
45
+ "num_tokens": 44943.0,
46
+ "step": 40
47
+ },
48
+ {
49
+ "epoch": 0.044444444444444446,
50
+ "grad_norm": 2.282668352127075,
51
+ "learning_rate": 9.8e-05,
52
+ "loss": 2.8468,
53
+ "mean_token_accuracy": 0.534189497679472,
54
+ "num_tokens": 56341.0,
55
+ "step": 50
56
+ },
57
+ {
58
+ "epoch": 0.05333333333333334,
59
+ "grad_norm": 2.168651819229126,
60
+ "learning_rate": 0.000118,
61
+ "loss": 2.7785,
62
+ "mean_token_accuracy": 0.5407359585165977,
63
+ "num_tokens": 67397.0,
64
+ "step": 60
65
+ },
66
+ {
67
+ "epoch": 0.06222222222222222,
68
+ "grad_norm": 2.289881467819214,
69
+ "learning_rate": 0.000138,
70
+ "loss": 2.736,
71
+ "mean_token_accuracy": 0.5326176360249519,
72
+ "num_tokens": 78482.0,
73
+ "step": 70
74
+ },
75
+ {
76
+ "epoch": 0.07111111111111111,
77
+ "grad_norm": 2.1038105487823486,
78
+ "learning_rate": 0.00015800000000000002,
79
+ "loss": 2.5855,
80
+ "mean_token_accuracy": 0.5618595249950886,
81
+ "num_tokens": 89803.0,
82
+ "step": 80
83
+ },
84
+ {
85
+ "epoch": 0.08,
86
+ "grad_norm": 2.24312686920166,
87
+ "learning_rate": 0.00017800000000000002,
88
+ "loss": 2.5365,
89
+ "mean_token_accuracy": 0.5661972932517528,
90
+ "num_tokens": 101015.0,
91
+ "step": 90
92
+ },
93
+ {
94
+ "epoch": 0.08888888888888889,
95
+ "grad_norm": 1.9482938051223755,
96
+ "learning_rate": 0.00019800000000000002,
97
+ "loss": 2.5634,
98
+ "mean_token_accuracy": 0.5538406319916248,
99
+ "num_tokens": 112364.0,
100
+ "step": 100
101
+ },
102
+ {
103
+ "epoch": 0.09777777777777778,
104
+ "grad_norm": 1.86210298538208,
105
+ "learning_rate": 0.00019945038167938932,
106
+ "loss": 2.4629,
107
+ "mean_token_accuracy": 0.5780388668179512,
108
+ "num_tokens": 122882.0,
109
+ "step": 110
110
+ },
111
+ {
112
+ "epoch": 0.10666666666666667,
113
+ "grad_norm": 1.8806918859481812,
114
+ "learning_rate": 0.00019883969465648855,
115
+ "loss": 2.5022,
116
+ "mean_token_accuracy": 0.563551553338766,
117
+ "num_tokens": 134028.0,
118
+ "step": 120
119
+ },
120
+ {
121
+ "epoch": 0.11555555555555555,
122
+ "grad_norm": 2.3264434337615967,
123
+ "learning_rate": 0.00019829007633587786,
124
+ "loss": 2.4065,
125
+ "mean_token_accuracy": 0.5807355619966984,
126
+ "num_tokens": 145192.0,
127
+ "step": 130
128
+ },
129
+ {
130
+ "epoch": 0.12444444444444444,
131
+ "grad_norm": 1.8537976741790771,
132
+ "learning_rate": 0.00019767938931297712,
133
+ "loss": 2.4838,
134
+ "mean_token_accuracy": 0.566282794624567,
135
+ "num_tokens": 156703.0,
136
+ "step": 140
137
+ },
138
+ {
139
+ "epoch": 0.13333333333333333,
140
+ "grad_norm": 2.0960652828216553,
141
+ "learning_rate": 0.00019706870229007636,
142
+ "loss": 2.4119,
143
+ "mean_token_accuracy": 0.5830203481018543,
144
+ "num_tokens": 168041.0,
145
+ "step": 150
146
+ },
147
+ {
148
+ "epoch": 0.14222222222222222,
149
+ "grad_norm": 2.2244813442230225,
150
+ "learning_rate": 0.00019645801526717557,
151
+ "loss": 2.3726,
152
+ "mean_token_accuracy": 0.5844443172216416,
153
+ "num_tokens": 178986.0,
154
+ "step": 160
155
+ },
156
+ {
157
+ "epoch": 0.1511111111111111,
158
+ "grad_norm": 1.8238722085952759,
159
+ "learning_rate": 0.0001958473282442748,
160
+ "loss": 2.4419,
161
+ "mean_token_accuracy": 0.5708602093160152,
162
+ "num_tokens": 190391.0,
163
+ "step": 170
164
+ },
165
+ {
166
+ "epoch": 0.16,
167
+ "grad_norm": 1.7154136896133423,
168
+ "learning_rate": 0.00019523664122137407,
169
+ "loss": 2.4293,
170
+ "mean_token_accuracy": 0.5748118035495281,
171
+ "num_tokens": 201989.0,
172
+ "step": 180
173
+ },
174
+ {
175
+ "epoch": 0.1688888888888889,
176
+ "grad_norm": 1.7582788467407227,
177
+ "learning_rate": 0.0001946259541984733,
178
+ "loss": 2.3577,
179
+ "mean_token_accuracy": 0.5877166777849198,
180
+ "num_tokens": 212914.0,
181
+ "step": 190
182
+ },
183
+ {
184
+ "epoch": 0.17777777777777778,
185
+ "grad_norm": 1.8613263368606567,
186
+ "learning_rate": 0.0001940152671755725,
187
+ "loss": 2.3486,
188
+ "mean_token_accuracy": 0.5889834299683571,
189
+ "num_tokens": 223936.0,
190
+ "step": 200
191
+ },
192
+ {
193
+ "epoch": 0.17777777777777778,
194
+ "eval_loss": 2.3320820331573486,
195
+ "eval_mean_token_accuracy": 0.5868698905706405,
196
+ "eval_num_tokens": 223936.0,
197
+ "eval_runtime": 49.2429,
198
+ "eval_samples_per_second": 20.307,
199
+ "eval_steps_per_second": 10.154,
200
+ "step": 200
201
+ },
202
+ {
203
+ "epoch": 0.18666666666666668,
204
+ "grad_norm": 1.8486477136611938,
205
+ "learning_rate": 0.00019340458015267175,
206
+ "loss": 2.3666,
207
+ "mean_token_accuracy": 0.5847611322999,
208
+ "num_tokens": 235036.0,
209
+ "step": 210
210
+ },
211
+ {
212
+ "epoch": 0.19555555555555557,
213
+ "grad_norm": 2.018049478530884,
214
+ "learning_rate": 0.000192793893129771,
215
+ "loss": 2.2689,
216
+ "mean_token_accuracy": 0.59971177354455,
217
+ "num_tokens": 246101.0,
218
+ "step": 220
219
+ },
220
+ {
221
+ "epoch": 0.20444444444444446,
222
+ "grad_norm": 1.7244890928268433,
223
+ "learning_rate": 0.00019218320610687024,
224
+ "loss": 2.3262,
225
+ "mean_token_accuracy": 0.5855986528098583,
226
+ "num_tokens": 257953.0,
227
+ "step": 230
228
+ },
229
+ {
230
+ "epoch": 0.21333333333333335,
231
+ "grad_norm": 1.8928934335708618,
232
+ "learning_rate": 0.00019157251908396948,
233
+ "loss": 2.3318,
234
+ "mean_token_accuracy": 0.5885626815259457,
235
+ "num_tokens": 269187.0,
236
+ "step": 240
237
+ },
238
+ {
239
+ "epoch": 0.2222222222222222,
240
+ "grad_norm": 1.7358920574188232,
241
+ "learning_rate": 0.0001909618320610687,
242
+ "loss": 2.2145,
243
+ "mean_token_accuracy": 0.6092555984854698,
244
+ "num_tokens": 279762.0,
245
+ "step": 250
246
+ },
247
+ {
248
+ "epoch": 0.2311111111111111,
249
+ "grad_norm": 1.6779032945632935,
250
+ "learning_rate": 0.00019035114503816795,
251
+ "loss": 2.3152,
252
+ "mean_token_accuracy": 0.584602715075016,
253
+ "num_tokens": 291454.0,
254
+ "step": 260
255
+ },
256
+ {
257
+ "epoch": 0.24,
258
+ "grad_norm": 1.6310207843780518,
259
+ "learning_rate": 0.0001897404580152672,
260
+ "loss": 2.2669,
261
+ "mean_token_accuracy": 0.5965895019471645,
262
+ "num_tokens": 302969.0,
263
+ "step": 270
264
+ },
265
+ {
266
+ "epoch": 0.24888888888888888,
267
+ "grad_norm": 1.6765615940093994,
268
+ "learning_rate": 0.00018912977099236642,
269
+ "loss": 2.269,
270
+ "mean_token_accuracy": 0.5934441670775413,
271
+ "num_tokens": 314204.0,
272
+ "step": 280
273
+ },
274
+ {
275
+ "epoch": 0.2577777777777778,
276
+ "grad_norm": 1.793959617614746,
277
+ "learning_rate": 0.00018851908396946566,
278
+ "loss": 2.2554,
279
+ "mean_token_accuracy": 0.600947193801403,
280
+ "num_tokens": 325649.0,
281
+ "step": 290
282
+ },
283
+ {
284
+ "epoch": 0.26666666666666666,
285
+ "grad_norm": 1.7492129802703857,
286
+ "learning_rate": 0.0001879083969465649,
287
+ "loss": 2.2157,
288
+ "mean_token_accuracy": 0.6022505328059197,
289
+ "num_tokens": 337167.0,
290
+ "step": 300
291
+ },
292
+ {
293
+ "epoch": 0.27555555555555555,
294
+ "grad_norm": 1.803576946258545,
295
+ "learning_rate": 0.00018729770992366413,
296
+ "loss": 2.2854,
297
+ "mean_token_accuracy": 0.5923042424023152,
298
+ "num_tokens": 348621.0,
299
+ "step": 310
300
+ },
301
+ {
302
+ "epoch": 0.28444444444444444,
303
+ "grad_norm": 1.9662351608276367,
304
+ "learning_rate": 0.00018668702290076337,
305
+ "loss": 2.2639,
306
+ "mean_token_accuracy": 0.588193366676569,
307
+ "num_tokens": 360272.0,
308
+ "step": 320
309
+ },
310
+ {
311
+ "epoch": 0.29333333333333333,
312
+ "grad_norm": 1.6725891828536987,
313
+ "learning_rate": 0.0001860763358778626,
314
+ "loss": 2.2249,
315
+ "mean_token_accuracy": 0.6054098337888718,
316
+ "num_tokens": 371346.0,
317
+ "step": 330
318
+ },
319
+ {
320
+ "epoch": 0.3022222222222222,
321
+ "grad_norm": 1.68416166305542,
322
+ "learning_rate": 0.00018546564885496184,
323
+ "loss": 2.1678,
324
+ "mean_token_accuracy": 0.6146526508033275,
325
+ "num_tokens": 382779.0,
326
+ "step": 340
327
+ },
328
+ {
329
+ "epoch": 0.3111111111111111,
330
+ "grad_norm": 1.7218507528305054,
331
+ "learning_rate": 0.00018485496183206108,
332
+ "loss": 2.2011,
333
+ "mean_token_accuracy": 0.6104303196072578,
334
+ "num_tokens": 393823.0,
335
+ "step": 350
336
+ },
337
+ {
338
+ "epoch": 0.32,
339
+ "grad_norm": 1.6817256212234497,
340
+ "learning_rate": 0.0001842442748091603,
341
+ "loss": 2.2264,
342
+ "mean_token_accuracy": 0.5987282857298851,
343
+ "num_tokens": 405438.0,
344
+ "step": 360
345
+ },
346
+ {
347
+ "epoch": 0.3288888888888889,
348
+ "grad_norm": 1.7454718351364136,
349
+ "learning_rate": 0.00018363358778625955,
350
+ "loss": 2.2712,
351
+ "mean_token_accuracy": 0.5939777493476868,
352
+ "num_tokens": 417299.0,
353
+ "step": 370
354
+ },
355
+ {
356
+ "epoch": 0.3377777777777778,
357
+ "grad_norm": 2.011315107345581,
358
+ "learning_rate": 0.00018302290076335878,
359
+ "loss": 2.2247,
360
+ "mean_token_accuracy": 0.6061037018895149,
361
+ "num_tokens": 428660.0,
362
+ "step": 380
363
+ },
364
+ {
365
+ "epoch": 0.3466666666666667,
366
+ "grad_norm": 1.6242053508758545,
367
+ "learning_rate": 0.00018241221374045802,
368
+ "loss": 2.232,
369
+ "mean_token_accuracy": 0.6062197655439376,
370
+ "num_tokens": 439768.0,
371
+ "step": 390
372
+ },
373
+ {
374
+ "epoch": 0.35555555555555557,
375
+ "grad_norm": 1.9328559637069702,
376
+ "learning_rate": 0.00018180152671755725,
377
+ "loss": 2.1291,
378
+ "mean_token_accuracy": 0.6168317429721355,
379
+ "num_tokens": 450808.0,
380
+ "step": 400
381
+ },
382
+ {
383
+ "epoch": 0.35555555555555557,
384
+ "eval_loss": 2.1662538051605225,
385
+ "eval_mean_token_accuracy": 0.6099509916305542,
386
+ "eval_num_tokens": 450808.0,
387
+ "eval_runtime": 49.4213,
388
+ "eval_samples_per_second": 20.234,
389
+ "eval_steps_per_second": 10.117,
390
+ "step": 400
391
+ },
392
+ {
393
+ "epoch": 0.36444444444444446,
394
+ "grad_norm": 1.8797143697738647,
395
+ "learning_rate": 0.0001811908396946565,
396
+ "loss": 2.2086,
397
+ "mean_token_accuracy": 0.6012695133686066,
398
+ "num_tokens": 461592.0,
399
+ "step": 410
400
+ },
401
+ {
402
+ "epoch": 0.37333333333333335,
403
+ "grad_norm": 1.7558225393295288,
404
+ "learning_rate": 0.00018058015267175575,
405
+ "loss": 2.1771,
406
+ "mean_token_accuracy": 0.6060668036341668,
407
+ "num_tokens": 473434.0,
408
+ "step": 420
409
+ },
410
+ {
411
+ "epoch": 0.38222222222222224,
412
+ "grad_norm": 1.845051884651184,
413
+ "learning_rate": 0.00017996946564885496,
414
+ "loss": 2.2576,
415
+ "mean_token_accuracy": 0.5929104581475257,
416
+ "num_tokens": 485130.0,
417
+ "step": 430
418
+ },
419
+ {
420
+ "epoch": 0.39111111111111113,
421
+ "grad_norm": 1.6992298364639282,
422
+ "learning_rate": 0.0001793587786259542,
423
+ "loss": 2.1815,
424
+ "mean_token_accuracy": 0.6100690707564353,
425
+ "num_tokens": 496482.0,
426
+ "step": 440
427
+ },
428
+ {
429
+ "epoch": 0.4,
430
+ "grad_norm": 1.7239253520965576,
431
+ "learning_rate": 0.00017874809160305343,
432
+ "loss": 2.2082,
433
+ "mean_token_accuracy": 0.6001435503363609,
434
+ "num_tokens": 508218.0,
435
+ "step": 450
436
+ },
437
+ {
438
+ "epoch": 0.4088888888888889,
439
+ "grad_norm": 1.7856336832046509,
440
+ "learning_rate": 0.0001781374045801527,
441
+ "loss": 2.1593,
442
+ "mean_token_accuracy": 0.6118309393525123,
443
+ "num_tokens": 519379.0,
444
+ "step": 460
445
+ },
446
+ {
447
+ "epoch": 0.4177777777777778,
448
+ "grad_norm": 1.611831545829773,
449
+ "learning_rate": 0.00017752671755725193,
450
+ "loss": 2.1797,
451
+ "mean_token_accuracy": 0.6033190444111824,
452
+ "num_tokens": 530561.0,
453
+ "step": 470
454
+ },
455
+ {
456
+ "epoch": 0.4266666666666667,
457
+ "grad_norm": 1.7420586347579956,
458
+ "learning_rate": 0.00017691603053435114,
459
+ "loss": 2.2027,
460
+ "mean_token_accuracy": 0.6067790001630783,
461
+ "num_tokens": 542631.0,
462
+ "step": 480
463
+ },
464
+ {
465
+ "epoch": 0.43555555555555553,
466
+ "grad_norm": 1.948723316192627,
467
+ "learning_rate": 0.00017630534351145038,
468
+ "loss": 2.1753,
469
+ "mean_token_accuracy": 0.6109650492668152,
470
+ "num_tokens": 553477.0,
471
+ "step": 490
472
+ },
473
+ {
474
+ "epoch": 0.4444444444444444,
475
+ "grad_norm": 1.7983819246292114,
476
+ "learning_rate": 0.00017569465648854964,
477
+ "loss": 2.158,
478
+ "mean_token_accuracy": 0.5996212616562844,
479
+ "num_tokens": 565400.0,
480
+ "step": 500
481
+ },
482
+ {
483
+ "epoch": 0.4533333333333333,
484
+ "grad_norm": 1.842372179031372,
485
+ "learning_rate": 0.00017508396946564888,
486
+ "loss": 2.0825,
487
+ "mean_token_accuracy": 0.6168116196990013,
488
+ "num_tokens": 576953.0,
489
+ "step": 510
490
+ },
491
+ {
492
+ "epoch": 0.4622222222222222,
493
+ "grad_norm": 1.91799795627594,
494
+ "learning_rate": 0.00017447328244274809,
495
+ "loss": 2.1022,
496
+ "mean_token_accuracy": 0.6168905258178711,
497
+ "num_tokens": 588003.0,
498
+ "step": 520
499
+ },
500
+ {
501
+ "epoch": 0.4711111111111111,
502
+ "grad_norm": 1.7727124691009521,
503
+ "learning_rate": 0.00017386259541984732,
504
+ "loss": 2.1695,
505
+ "mean_token_accuracy": 0.5997609972953797,
506
+ "num_tokens": 600043.0,
507
+ "step": 530
508
+ },
509
+ {
510
+ "epoch": 0.48,
511
+ "grad_norm": 1.8602296113967896,
512
+ "learning_rate": 0.00017325190839694658,
513
+ "loss": 2.0849,
514
+ "mean_token_accuracy": 0.6266478568315506,
515
+ "num_tokens": 610974.0,
516
+ "step": 540
517
+ },
518
+ {
519
+ "epoch": 0.4888888888888889,
520
+ "grad_norm": 1.545620083808899,
521
+ "learning_rate": 0.00017264122137404582,
522
+ "loss": 2.1824,
523
+ "mean_token_accuracy": 0.6072694823145867,
524
+ "num_tokens": 622632.0,
525
+ "step": 550
526
+ },
527
+ {
528
+ "epoch": 0.49777777777777776,
529
+ "grad_norm": 1.7485988140106201,
530
+ "learning_rate": 0.00017203053435114506,
531
+ "loss": 2.1374,
532
+ "mean_token_accuracy": 0.6164417043328285,
533
+ "num_tokens": 634093.0,
534
+ "step": 560
535
+ },
536
+ {
537
+ "epoch": 0.5066666666666667,
538
+ "grad_norm": 1.8591196537017822,
539
+ "learning_rate": 0.00017141984732824426,
540
+ "loss": 2.0928,
541
+ "mean_token_accuracy": 0.6241554819047451,
542
+ "num_tokens": 645226.0,
543
+ "step": 570
544
+ },
545
+ {
546
+ "epoch": 0.5155555555555555,
547
+ "grad_norm": 1.8163517713546753,
548
+ "learning_rate": 0.00017080916030534353,
549
+ "loss": 2.0476,
550
+ "mean_token_accuracy": 0.6285594403743744,
551
+ "num_tokens": 656188.0,
552
+ "step": 580
553
+ },
554
+ {
555
+ "epoch": 0.5244444444444445,
556
+ "grad_norm": 1.7729696035385132,
557
+ "learning_rate": 0.00017019847328244276,
558
+ "loss": 2.1036,
559
+ "mean_token_accuracy": 0.6208315283060074,
560
+ "num_tokens": 667642.0,
561
+ "step": 590
562
+ },
563
+ {
564
+ "epoch": 0.5333333333333333,
565
+ "grad_norm": 1.7804032564163208,
566
+ "learning_rate": 0.000169587786259542,
567
+ "loss": 2.1174,
568
+ "mean_token_accuracy": 0.6148250237107277,
569
+ "num_tokens": 678769.0,
570
+ "step": 600
571
+ },
572
+ {
573
+ "epoch": 0.5333333333333333,
574
+ "eval_loss": 2.0850696563720703,
575
+ "eval_mean_token_accuracy": 0.6197466601729393,
576
+ "eval_num_tokens": 678769.0,
577
+ "eval_runtime": 49.7611,
578
+ "eval_samples_per_second": 20.096,
579
+ "eval_steps_per_second": 10.048,
580
+ "step": 600
581
+ },
582
+ {
583
+ "epoch": 0.5422222222222223,
584
+ "grad_norm": 1.8643274307250977,
585
+ "learning_rate": 0.00016897709923664124,
586
+ "loss": 2.0485,
587
+ "mean_token_accuracy": 0.6331146821379662,
588
+ "num_tokens": 690014.0,
589
+ "step": 610
590
+ },
591
+ {
592
+ "epoch": 0.5511111111111111,
593
+ "grad_norm": 1.8060939311981201,
594
+ "learning_rate": 0.00016836641221374047,
595
+ "loss": 2.1117,
596
+ "mean_token_accuracy": 0.612041813135147,
597
+ "num_tokens": 701734.0,
598
+ "step": 620
599
+ },
600
+ {
601
+ "epoch": 0.56,
602
+ "grad_norm": 1.7059085369110107,
603
+ "learning_rate": 0.0001677557251908397,
604
+ "loss": 2.0747,
605
+ "mean_token_accuracy": 0.6174572542309761,
606
+ "num_tokens": 713570.0,
607
+ "step": 630
608
+ },
609
+ {
610
+ "epoch": 0.5688888888888889,
611
+ "grad_norm": 1.6600592136383057,
612
+ "learning_rate": 0.00016714503816793894,
613
+ "loss": 2.0685,
614
+ "mean_token_accuracy": 0.6293445661664009,
615
+ "num_tokens": 724815.0,
616
+ "step": 640
617
+ },
618
+ {
619
+ "epoch": 0.5777777777777777,
620
+ "grad_norm": 1.6598913669586182,
621
+ "learning_rate": 0.00016653435114503818,
622
+ "loss": 2.0255,
623
+ "mean_token_accuracy": 0.6309839904308319,
624
+ "num_tokens": 735777.0,
625
+ "step": 650
626
+ },
627
+ {
628
+ "epoch": 0.5866666666666667,
629
+ "grad_norm": 1.8306963443756104,
630
+ "learning_rate": 0.00016592366412213741,
631
+ "loss": 2.1249,
632
+ "mean_token_accuracy": 0.6147443532943726,
633
+ "num_tokens": 746903.0,
634
+ "step": 660
635
+ },
636
+ {
637
+ "epoch": 0.5955555555555555,
638
+ "grad_norm": 1.626795768737793,
639
+ "learning_rate": 0.00016531297709923665,
640
+ "loss": 2.0694,
641
+ "mean_token_accuracy": 0.6254988595843315,
642
+ "num_tokens": 757881.0,
643
+ "step": 670
644
+ },
645
+ {
646
+ "epoch": 0.6044444444444445,
647
+ "grad_norm": 1.710806131362915,
648
+ "learning_rate": 0.00016470229007633589,
649
+ "loss": 2.0397,
650
+ "mean_token_accuracy": 0.6233279958367348,
651
+ "num_tokens": 768982.0,
652
+ "step": 680
653
+ },
654
+ {
655
+ "epoch": 0.6133333333333333,
656
+ "grad_norm": 1.7051280736923218,
657
+ "learning_rate": 0.00016409160305343512,
658
+ "loss": 2.116,
659
+ "mean_token_accuracy": 0.6183760315179825,
660
+ "num_tokens": 780072.0,
661
+ "step": 690
662
+ },
663
+ {
664
+ "epoch": 0.6222222222222222,
665
+ "grad_norm": 1.607917070388794,
666
+ "learning_rate": 0.00016348091603053436,
667
+ "loss": 2.0478,
668
+ "mean_token_accuracy": 0.6331974640488625,
669
+ "num_tokens": 791061.0,
670
+ "step": 700
671
+ },
672
+ {
673
+ "epoch": 0.6311111111111111,
674
+ "grad_norm": 1.7803592681884766,
675
+ "learning_rate": 0.0001628702290076336,
676
+ "loss": 2.0595,
677
+ "mean_token_accuracy": 0.6249041527509689,
678
+ "num_tokens": 801867.0,
679
+ "step": 710
680
+ },
681
+ {
682
+ "epoch": 0.64,
683
+ "grad_norm": 1.6132373809814453,
684
+ "learning_rate": 0.00016225954198473283,
685
+ "loss": 2.0789,
686
+ "mean_token_accuracy": 0.6235784366726875,
687
+ "num_tokens": 813112.0,
688
+ "step": 720
689
+ },
690
+ {
691
+ "epoch": 0.6488888888888888,
692
+ "grad_norm": 1.790528655052185,
693
+ "learning_rate": 0.00016164885496183207,
694
+ "loss": 2.0632,
695
+ "mean_token_accuracy": 0.6268924325704575,
696
+ "num_tokens": 824133.0,
697
+ "step": 730
698
+ },
699
+ {
700
+ "epoch": 0.6577777777777778,
701
+ "grad_norm": 2.0007362365722656,
702
+ "learning_rate": 0.0001610381679389313,
703
+ "loss": 2.0701,
704
+ "mean_token_accuracy": 0.6189413338899612,
705
+ "num_tokens": 835469.0,
706
+ "step": 740
707
+ },
708
+ {
709
+ "epoch": 0.6666666666666666,
710
+ "grad_norm": 2.227158546447754,
711
+ "learning_rate": 0.00016042748091603054,
712
+ "loss": 2.0339,
713
+ "mean_token_accuracy": 0.621903920173645,
714
+ "num_tokens": 846572.0,
715
+ "step": 750
716
+ },
717
+ {
718
+ "epoch": 0.6755555555555556,
719
+ "grad_norm": 1.80472731590271,
720
+ "learning_rate": 0.00015981679389312977,
721
+ "loss": 2.1285,
722
+ "mean_token_accuracy": 0.604806374013424,
723
+ "num_tokens": 857795.0,
724
+ "step": 760
725
+ },
726
+ {
727
+ "epoch": 0.6844444444444444,
728
+ "grad_norm": 1.7893937826156616,
729
+ "learning_rate": 0.000159206106870229,
730
+ "loss": 2.0347,
731
+ "mean_token_accuracy": 0.6292635962367058,
732
+ "num_tokens": 868429.0,
733
+ "step": 770
734
+ },
735
+ {
736
+ "epoch": 0.6933333333333334,
737
+ "grad_norm": 1.6761573553085327,
738
+ "learning_rate": 0.00015859541984732824,
739
+ "loss": 2.0591,
740
+ "mean_token_accuracy": 0.6254431992769242,
741
+ "num_tokens": 879659.0,
742
+ "step": 780
743
+ },
744
+ {
745
+ "epoch": 0.7022222222222222,
746
+ "grad_norm": 1.803045630455017,
747
+ "learning_rate": 0.0001579847328244275,
748
+ "loss": 2.0293,
749
+ "mean_token_accuracy": 0.6273573949933052,
750
+ "num_tokens": 890911.0,
751
+ "step": 790
752
+ },
753
+ {
754
+ "epoch": 0.7111111111111111,
755
+ "grad_norm": 1.7385220527648926,
756
+ "learning_rate": 0.00015737404580152672,
757
+ "loss": 2.0197,
758
+ "mean_token_accuracy": 0.63025072067976,
759
+ "num_tokens": 902240.0,
760
+ "step": 800
761
+ },
762
+ {
763
+ "epoch": 0.7111111111111111,
764
+ "eval_loss": 2.0297935009002686,
765
+ "eval_mean_token_accuracy": 0.628437293112278,
766
+ "eval_num_tokens": 902240.0,
767
+ "eval_runtime": 49.3011,
768
+ "eval_samples_per_second": 20.284,
769
+ "eval_steps_per_second": 10.142,
770
+ "step": 800
771
+ },
772
+ {
773
+ "epoch": 0.72,
774
+ "grad_norm": 1.8906656503677368,
775
+ "learning_rate": 0.00015676335877862595,
776
+ "loss": 2.0806,
777
+ "mean_token_accuracy": 0.619849094748497,
778
+ "num_tokens": 914009.0,
779
+ "step": 810
780
+ },
781
+ {
782
+ "epoch": 0.7288888888888889,
783
+ "grad_norm": 1.714268684387207,
784
+ "learning_rate": 0.0001561526717557252,
785
+ "loss": 2.0343,
786
+ "mean_token_accuracy": 0.632188580930233,
787
+ "num_tokens": 925091.0,
788
+ "step": 820
789
+ },
790
+ {
791
+ "epoch": 0.7377777777777778,
792
+ "grad_norm": 1.833918809890747,
793
+ "learning_rate": 0.00015554198473282445,
794
+ "loss": 2.0747,
795
+ "mean_token_accuracy": 0.6280180156230927,
796
+ "num_tokens": 936675.0,
797
+ "step": 830
798
+ },
799
+ {
800
+ "epoch": 0.7466666666666667,
801
+ "grad_norm": 1.9817575216293335,
802
+ "learning_rate": 0.00015493129770992366,
803
+ "loss": 2.0859,
804
+ "mean_token_accuracy": 0.6128378361463547,
805
+ "num_tokens": 948151.0,
806
+ "step": 840
807
+ },
808
+ {
809
+ "epoch": 0.7555555555555555,
810
+ "grad_norm": 1.5982656478881836,
811
+ "learning_rate": 0.0001543206106870229,
812
+ "loss": 2.0455,
813
+ "mean_token_accuracy": 0.6276382938027382,
814
+ "num_tokens": 959266.0,
815
+ "step": 850
816
+ },
817
+ {
818
+ "epoch": 0.7644444444444445,
819
+ "grad_norm": 1.7298970222473145,
820
+ "learning_rate": 0.00015370992366412213,
821
+ "loss": 1.9604,
822
+ "mean_token_accuracy": 0.6377590849995614,
823
+ "num_tokens": 970339.0,
824
+ "step": 860
825
+ },
826
+ {
827
+ "epoch": 0.7733333333333333,
828
+ "grad_norm": 1.8064581155776978,
829
+ "learning_rate": 0.0001530992366412214,
830
+ "loss": 2.0698,
831
+ "mean_token_accuracy": 0.6194617792963981,
832
+ "num_tokens": 981805.0,
833
+ "step": 870
834
+ },
835
+ {
836
+ "epoch": 0.7822222222222223,
837
+ "grad_norm": 1.5860410928726196,
838
+ "learning_rate": 0.00015248854961832063,
839
+ "loss": 2.0182,
840
+ "mean_token_accuracy": 0.6292306095361709,
841
+ "num_tokens": 993552.0,
842
+ "step": 880
843
+ },
844
+ {
845
+ "epoch": 0.7911111111111111,
846
+ "grad_norm": 1.8761259317398071,
847
+ "learning_rate": 0.00015187786259541984,
848
+ "loss": 2.0335,
849
+ "mean_token_accuracy": 0.6285651385784149,
850
+ "num_tokens": 1004400.0,
851
+ "step": 890
852
+ },
853
+ {
854
+ "epoch": 0.8,
855
+ "grad_norm": 1.6973590850830078,
856
+ "learning_rate": 0.00015126717557251908,
857
+ "loss": 2.0927,
858
+ "mean_token_accuracy": 0.6183614790439605,
859
+ "num_tokens": 1015564.0,
860
+ "step": 900
861
+ },
862
+ {
863
+ "epoch": 0.8088888888888889,
864
+ "grad_norm": 1.6477675437927246,
865
+ "learning_rate": 0.00015065648854961834,
866
+ "loss": 1.9187,
867
+ "mean_token_accuracy": 0.6427812784910202,
868
+ "num_tokens": 1026849.0,
869
+ "step": 910
870
+ },
871
+ {
872
+ "epoch": 0.8177777777777778,
873
+ "grad_norm": 1.6942589282989502,
874
+ "learning_rate": 0.00015004580152671757,
875
+ "loss": 2.0139,
876
+ "mean_token_accuracy": 0.6322552219033242,
877
+ "num_tokens": 1037721.0,
878
+ "step": 920
879
+ },
880
+ {
881
+ "epoch": 0.8266666666666667,
882
+ "grad_norm": 1.6394822597503662,
883
+ "learning_rate": 0.0001494351145038168,
884
+ "loss": 2.0392,
885
+ "mean_token_accuracy": 0.6273665294051171,
886
+ "num_tokens": 1048986.0,
887
+ "step": 930
888
+ },
889
+ {
890
+ "epoch": 0.8355555555555556,
891
+ "grad_norm": 1.697804570198059,
892
+ "learning_rate": 0.00014882442748091602,
893
+ "loss": 2.0412,
894
+ "mean_token_accuracy": 0.625536386668682,
895
+ "num_tokens": 1060627.0,
896
+ "step": 940
897
+ },
898
+ {
899
+ "epoch": 0.8444444444444444,
900
+ "grad_norm": 1.8058092594146729,
901
+ "learning_rate": 0.00014821374045801528,
902
+ "loss": 1.9737,
903
+ "mean_token_accuracy": 0.6332821652293206,
904
+ "num_tokens": 1071482.0,
905
+ "step": 950
906
+ },
907
+ {
908
+ "epoch": 0.8533333333333334,
909
+ "grad_norm": 1.773294448852539,
910
+ "learning_rate": 0.00014760305343511452,
911
+ "loss": 2.054,
912
+ "mean_token_accuracy": 0.6256278708577157,
913
+ "num_tokens": 1082672.0,
914
+ "step": 960
915
+ },
916
+ {
917
+ "epoch": 0.8622222222222222,
918
+ "grad_norm": 1.6936707496643066,
919
+ "learning_rate": 0.00014699236641221375,
920
+ "loss": 1.9957,
921
+ "mean_token_accuracy": 0.6333451583981514,
922
+ "num_tokens": 1093493.0,
923
+ "step": 970
924
+ },
925
+ {
926
+ "epoch": 0.8711111111111111,
927
+ "grad_norm": 1.7029008865356445,
928
+ "learning_rate": 0.000146381679389313,
929
+ "loss": 2.0526,
930
+ "mean_token_accuracy": 0.6244132176041604,
931
+ "num_tokens": 1104857.0,
932
+ "step": 980
933
+ },
934
+ {
935
+ "epoch": 0.88,
936
+ "grad_norm": 1.8421082496643066,
937
+ "learning_rate": 0.00014577099236641223,
938
+ "loss": 2.0311,
939
+ "mean_token_accuracy": 0.6236826583743096,
940
+ "num_tokens": 1116131.0,
941
+ "step": 990
942
+ },
943
+ {
944
+ "epoch": 0.8888888888888888,
945
+ "grad_norm": 1.646053433418274,
946
+ "learning_rate": 0.00014516030534351146,
947
+ "loss": 1.9973,
948
+ "mean_token_accuracy": 0.6274659112095833,
949
+ "num_tokens": 1127612.0,
950
+ "step": 1000
951
+ },
952
+ {
953
+ "epoch": 0.8888888888888888,
954
+ "eval_loss": 1.989682674407959,
955
+ "eval_mean_token_accuracy": 0.633990108013153,
956
+ "eval_num_tokens": 1127612.0,
957
+ "eval_runtime": 49.3043,
958
+ "eval_samples_per_second": 20.282,
959
+ "eval_steps_per_second": 10.141,
960
+ "step": 1000
961
+ },
962
+ {
963
+ "epoch": 0.8977777777777778,
964
+ "grad_norm": 1.5941271781921387,
965
+ "learning_rate": 0.0001445496183206107,
966
+ "loss": 2.0579,
967
+ "mean_token_accuracy": 0.6256210282444954,
968
+ "num_tokens": 1138866.0,
969
+ "step": 1010
970
+ },
971
+ {
972
+ "epoch": 0.9066666666666666,
973
+ "grad_norm": 1.7826253175735474,
974
+ "learning_rate": 0.00014393893129770993,
975
+ "loss": 1.9866,
976
+ "mean_token_accuracy": 0.6332772478461266,
977
+ "num_tokens": 1150411.0,
978
+ "step": 1020
979
+ },
980
+ {
981
+ "epoch": 0.9155555555555556,
982
+ "grad_norm": 1.8722221851348877,
983
+ "learning_rate": 0.00014332824427480917,
984
+ "loss": 2.0398,
985
+ "mean_token_accuracy": 0.627329595386982,
986
+ "num_tokens": 1161360.0,
987
+ "step": 1030
988
+ },
989
+ {
990
+ "epoch": 0.9244444444444444,
991
+ "grad_norm": 1.6533294916152954,
992
+ "learning_rate": 0.0001427175572519084,
993
+ "loss": 2.0271,
994
+ "mean_token_accuracy": 0.6259514302015304,
995
+ "num_tokens": 1172683.0,
996
+ "step": 1040
997
+ },
998
+ {
999
+ "epoch": 0.9333333333333333,
1000
+ "grad_norm": 1.5746543407440186,
1001
+ "learning_rate": 0.00014210687022900764,
1002
+ "loss": 1.9634,
1003
+ "mean_token_accuracy": 0.6359310179948807,
1004
+ "num_tokens": 1183277.0,
1005
+ "step": 1050
1006
+ },
1007
+ {
1008
+ "epoch": 0.9422222222222222,
1009
+ "grad_norm": 1.6094276905059814,
1010
+ "learning_rate": 0.00014149618320610688,
1011
+ "loss": 1.9195,
1012
+ "mean_token_accuracy": 0.649330523610115,
1013
+ "num_tokens": 1194160.0,
1014
+ "step": 1060
1015
+ },
1016
+ {
1017
+ "epoch": 0.9511111111111111,
1018
+ "grad_norm": 1.9643882513046265,
1019
+ "learning_rate": 0.0001408854961832061,
1020
+ "loss": 2.0042,
1021
+ "mean_token_accuracy": 0.6356254667043686,
1022
+ "num_tokens": 1205308.0,
1023
+ "step": 1070
1024
+ },
1025
+ {
1026
+ "epoch": 0.96,
1027
+ "grad_norm": 1.8238948583602905,
1028
+ "learning_rate": 0.00014027480916030535,
1029
+ "loss": 1.9172,
1030
+ "mean_token_accuracy": 0.6497033536434174,
1031
+ "num_tokens": 1215760.0,
1032
+ "step": 1080
1033
+ },
1034
+ {
1035
+ "epoch": 0.9688888888888889,
1036
+ "grad_norm": 1.7422380447387695,
1037
+ "learning_rate": 0.00013966412213740458,
1038
+ "loss": 2.0213,
1039
+ "mean_token_accuracy": 0.6309294819831848,
1040
+ "num_tokens": 1226775.0,
1041
+ "step": 1090
1042
+ },
1043
+ {
1044
+ "epoch": 0.9777777777777777,
1045
+ "grad_norm": 1.651795744895935,
1046
+ "learning_rate": 0.00013905343511450382,
1047
+ "loss": 2.033,
1048
+ "mean_token_accuracy": 0.6295390352606773,
1049
+ "num_tokens": 1238191.0,
1050
+ "step": 1100
1051
+ },
1052
+ {
1053
+ "epoch": 0.9866666666666667,
1054
+ "grad_norm": 1.673543095588684,
1055
+ "learning_rate": 0.00013844274809160308,
1056
+ "loss": 2.0085,
1057
+ "mean_token_accuracy": 0.6329691678285598,
1058
+ "num_tokens": 1249561.0,
1059
+ "step": 1110
1060
+ },
1061
+ {
1062
+ "epoch": 0.9955555555555555,
1063
+ "grad_norm": 1.7423163652420044,
1064
+ "learning_rate": 0.0001378320610687023,
1065
+ "loss": 1.9751,
1066
+ "mean_token_accuracy": 0.6307685926556588,
1067
+ "num_tokens": 1260429.0,
1068
+ "step": 1120
1069
+ },
1070
+ {
1071
+ "epoch": 1.0044444444444445,
1072
+ "grad_norm": 1.4878981113433838,
1073
+ "learning_rate": 0.00013722137404580153,
1074
+ "loss": 1.9171,
1075
+ "mean_token_accuracy": 0.644737622141838,
1076
+ "num_tokens": 1271111.0,
1077
+ "step": 1130
1078
+ },
1079
+ {
1080
+ "epoch": 1.0133333333333334,
1081
+ "grad_norm": 1.5343797206878662,
1082
+ "learning_rate": 0.00013661068702290076,
1083
+ "loss": 1.8544,
1084
+ "mean_token_accuracy": 0.6503374725580215,
1085
+ "num_tokens": 1282434.0,
1086
+ "step": 1140
1087
+ },
1088
+ {
1089
+ "epoch": 1.0222222222222221,
1090
+ "grad_norm": 1.5450340509414673,
1091
+ "learning_rate": 0.00013600000000000003,
1092
+ "loss": 1.828,
1093
+ "mean_token_accuracy": 0.6514182686805725,
1094
+ "num_tokens": 1294382.0,
1095
+ "step": 1150
1096
+ },
1097
+ {
1098
+ "epoch": 1.031111111111111,
1099
+ "grad_norm": 1.8313877582550049,
1100
+ "learning_rate": 0.00013538931297709923,
1101
+ "loss": 1.7704,
1102
+ "mean_token_accuracy": 0.6693721905350685,
1103
+ "num_tokens": 1305343.0,
1104
+ "step": 1160
1105
+ },
1106
+ {
1107
+ "epoch": 1.04,
1108
+ "grad_norm": 1.8418430089950562,
1109
+ "learning_rate": 0.00013477862595419847,
1110
+ "loss": 1.7591,
1111
+ "mean_token_accuracy": 0.67226582467556,
1112
+ "num_tokens": 1316558.0,
1113
+ "step": 1170
1114
+ },
1115
+ {
1116
+ "epoch": 1.048888888888889,
1117
+ "grad_norm": 1.6022825241088867,
1118
+ "learning_rate": 0.0001341679389312977,
1119
+ "loss": 1.8048,
1120
+ "mean_token_accuracy": 0.6629651457071304,
1121
+ "num_tokens": 1327938.0,
1122
+ "step": 1180
1123
+ },
1124
+ {
1125
+ "epoch": 1.0577777777777777,
1126
+ "grad_norm": 1.5888707637786865,
1127
+ "learning_rate": 0.00013355725190839697,
1128
+ "loss": 1.773,
1129
+ "mean_token_accuracy": 0.6730352655053139,
1130
+ "num_tokens": 1338732.0,
1131
+ "step": 1190
1132
+ },
1133
+ {
1134
+ "epoch": 1.0666666666666667,
1135
+ "grad_norm": 1.833946943283081,
1136
+ "learning_rate": 0.0001329465648854962,
1137
+ "loss": 1.7887,
1138
+ "mean_token_accuracy": 0.6616317644715309,
1139
+ "num_tokens": 1350096.0,
1140
+ "step": 1200
1141
+ },
1142
+ {
1143
+ "epoch": 1.0666666666666667,
1144
+ "eval_loss": 1.9697085618972778,
1145
+ "eval_mean_token_accuracy": 0.6378205664157868,
1146
+ "eval_num_tokens": 1350096.0,
1147
+ "eval_runtime": 49.9237,
1148
+ "eval_samples_per_second": 20.031,
1149
+ "eval_steps_per_second": 10.015,
1150
+ "step": 1200
1151
+ },
1152
+ {
1153
+ "epoch": 1.0755555555555556,
1154
+ "grad_norm": 1.6338160037994385,
1155
+ "learning_rate": 0.00013233587786259541,
1156
+ "loss": 1.7889,
1157
+ "mean_token_accuracy": 0.6668319672346115,
1158
+ "num_tokens": 1360771.0,
1159
+ "step": 1210
1160
+ },
1161
+ {
1162
+ "epoch": 1.0844444444444445,
1163
+ "grad_norm": 1.8737561702728271,
1164
+ "learning_rate": 0.00013172519083969465,
1165
+ "loss": 1.7997,
1166
+ "mean_token_accuracy": 0.6570939287543297,
1167
+ "num_tokens": 1372450.0,
1168
+ "step": 1220
1169
+ },
1170
+ {
1171
+ "epoch": 1.0933333333333333,
1172
+ "grad_norm": 1.758074402809143,
1173
+ "learning_rate": 0.0001311145038167939,
1174
+ "loss": 1.8457,
1175
+ "mean_token_accuracy": 0.653074924647808,
1176
+ "num_tokens": 1383711.0,
1177
+ "step": 1230
1178
+ },
1179
+ {
1180
+ "epoch": 1.1022222222222222,
1181
+ "grad_norm": 1.839158296585083,
1182
+ "learning_rate": 0.00013050381679389315,
1183
+ "loss": 1.8013,
1184
+ "mean_token_accuracy": 0.6608111187815666,
1185
+ "num_tokens": 1394856.0,
1186
+ "step": 1240
1187
+ },
1188
+ {
1189
+ "epoch": 1.1111111111111112,
1190
+ "grad_norm": 1.733567476272583,
1191
+ "learning_rate": 0.00012989312977099238,
1192
+ "loss": 1.7814,
1193
+ "mean_token_accuracy": 0.6655508041381836,
1194
+ "num_tokens": 1406193.0,
1195
+ "step": 1250
1196
+ },
1197
+ {
1198
+ "epoch": 1.12,
1199
+ "grad_norm": 1.6274900436401367,
1200
+ "learning_rate": 0.0001292824427480916,
1201
+ "loss": 1.858,
1202
+ "mean_token_accuracy": 0.6488608077168465,
1203
+ "num_tokens": 1417607.0,
1204
+ "step": 1260
1205
+ },
1206
+ {
1207
+ "epoch": 1.1288888888888888,
1208
+ "grad_norm": 1.690090537071228,
1209
+ "learning_rate": 0.00012867175572519086,
1210
+ "loss": 1.8256,
1211
+ "mean_token_accuracy": 0.6595686703920365,
1212
+ "num_tokens": 1429073.0,
1213
+ "step": 1270
1214
+ },
1215
+ {
1216
+ "epoch": 1.1377777777777778,
1217
+ "grad_norm": 1.6638071537017822,
1218
+ "learning_rate": 0.0001280610687022901,
1219
+ "loss": 1.8334,
1220
+ "mean_token_accuracy": 0.6580470725893974,
1221
+ "num_tokens": 1440194.0,
1222
+ "step": 1280
1223
+ },
1224
+ {
1225
+ "epoch": 1.1466666666666667,
1226
+ "grad_norm": 1.8339307308197021,
1227
+ "learning_rate": 0.00012745038167938933,
1228
+ "loss": 1.783,
1229
+ "mean_token_accuracy": 0.6632378786802292,
1230
+ "num_tokens": 1451221.0,
1231
+ "step": 1290
1232
+ },
1233
+ {
1234
+ "epoch": 1.1555555555555554,
1235
+ "grad_norm": 1.7621415853500366,
1236
+ "learning_rate": 0.00012683969465648854,
1237
+ "loss": 1.844,
1238
+ "mean_token_accuracy": 0.6506654173135757,
1239
+ "num_tokens": 1462493.0,
1240
+ "step": 1300
1241
+ },
1242
+ {
1243
+ "epoch": 1.1644444444444444,
1244
+ "grad_norm": 1.7811567783355713,
1245
+ "learning_rate": 0.00012622900763358777,
1246
+ "loss": 1.8235,
1247
+ "mean_token_accuracy": 0.6505810797214509,
1248
+ "num_tokens": 1473710.0,
1249
+ "step": 1310
1250
+ },
1251
+ {
1252
+ "epoch": 1.1733333333333333,
1253
+ "grad_norm": 1.9157836437225342,
1254
+ "learning_rate": 0.00012561832061068704,
1255
+ "loss": 1.8885,
1256
+ "mean_token_accuracy": 0.6459546625614166,
1257
+ "num_tokens": 1485215.0,
1258
+ "step": 1320
1259
+ },
1260
+ {
1261
+ "epoch": 1.1822222222222223,
1262
+ "grad_norm": 1.6572569608688354,
1263
+ "learning_rate": 0.00012500763358778627,
1264
+ "loss": 1.813,
1265
+ "mean_token_accuracy": 0.6597578257322312,
1266
+ "num_tokens": 1496371.0,
1267
+ "step": 1330
1268
+ },
1269
+ {
1270
+ "epoch": 1.1911111111111112,
1271
+ "grad_norm": 1.8602449893951416,
1272
+ "learning_rate": 0.0001243969465648855,
1273
+ "loss": 1.8179,
1274
+ "mean_token_accuracy": 0.6519266426563263,
1275
+ "num_tokens": 1508348.0,
1276
+ "step": 1340
1277
+ },
1278
+ {
1279
+ "epoch": 1.2,
1280
+ "grad_norm": 1.8736369609832764,
1281
+ "learning_rate": 0.00012378625954198472,
1282
+ "loss": 1.8029,
1283
+ "mean_token_accuracy": 0.6621162816882133,
1284
+ "num_tokens": 1519322.0,
1285
+ "step": 1350
1286
+ },
1287
+ {
1288
+ "epoch": 1.208888888888889,
1289
+ "grad_norm": 2.026744842529297,
1290
+ "learning_rate": 0.00012317557251908398,
1291
+ "loss": 1.8168,
1292
+ "mean_token_accuracy": 0.6635635286569596,
1293
+ "num_tokens": 1530183.0,
1294
+ "step": 1360
1295
+ },
1296
+ {
1297
+ "epoch": 1.2177777777777778,
1298
+ "grad_norm": 1.7360782623291016,
1299
+ "learning_rate": 0.00012256488549618322,
1300
+ "loss": 1.7521,
1301
+ "mean_token_accuracy": 0.6706348299980164,
1302
+ "num_tokens": 1540862.0,
1303
+ "step": 1370
1304
+ },
1305
+ {
1306
+ "epoch": 1.2266666666666666,
1307
+ "grad_norm": 1.9620578289031982,
1308
+ "learning_rate": 0.00012195419847328244,
1309
+ "loss": 1.8228,
1310
+ "mean_token_accuracy": 0.6569086670875549,
1311
+ "num_tokens": 1552212.0,
1312
+ "step": 1380
1313
+ },
1314
+ {
1315
+ "epoch": 1.2355555555555555,
1316
+ "grad_norm": 1.6294327974319458,
1317
+ "learning_rate": 0.00012134351145038167,
1318
+ "loss": 1.7654,
1319
+ "mean_token_accuracy": 0.6697377026081085,
1320
+ "num_tokens": 1563356.0,
1321
+ "step": 1390
1322
+ },
1323
+ {
1324
+ "epoch": 1.2444444444444445,
1325
+ "grad_norm": 1.7311524152755737,
1326
+ "learning_rate": 0.00012073282442748092,
1327
+ "loss": 1.9019,
1328
+ "mean_token_accuracy": 0.6457875579595566,
1329
+ "num_tokens": 1574569.0,
1330
+ "step": 1400
1331
+ },
1332
+ {
1333
+ "epoch": 1.2444444444444445,
1334
+ "eval_loss": 1.9411770105361938,
1335
+ "eval_mean_token_accuracy": 0.6407178282737732,
1336
+ "eval_num_tokens": 1574569.0,
1337
+ "eval_runtime": 48.3309,
1338
+ "eval_samples_per_second": 20.691,
1339
+ "eval_steps_per_second": 10.345,
1340
+ "step": 1400
1341
+ },
1342
+ {
1343
+ "epoch": 1.2533333333333334,
1344
+ "grad_norm": 1.8629728555679321,
1345
+ "learning_rate": 0.00012012213740458016,
1346
+ "loss": 1.7585,
1347
+ "mean_token_accuracy": 0.671015702188015,
1348
+ "num_tokens": 1585308.0,
1349
+ "step": 1410
1350
+ },
1351
+ {
1352
+ "epoch": 1.2622222222222224,
1353
+ "grad_norm": 1.958808183670044,
1354
+ "learning_rate": 0.0001195114503816794,
1355
+ "loss": 1.8479,
1356
+ "mean_token_accuracy": 0.6535898372530937,
1357
+ "num_tokens": 1596886.0,
1358
+ "step": 1420
1359
+ },
1360
+ {
1361
+ "epoch": 1.271111111111111,
1362
+ "grad_norm": 1.950421690940857,
1363
+ "learning_rate": 0.00011890076335877862,
1364
+ "loss": 1.8173,
1365
+ "mean_token_accuracy": 0.6655478686094284,
1366
+ "num_tokens": 1607683.0,
1367
+ "step": 1430
1368
+ },
1369
+ {
1370
+ "epoch": 1.28,
1371
+ "grad_norm": 1.8152872323989868,
1372
+ "learning_rate": 0.00011829007633587788,
1373
+ "loss": 1.8791,
1374
+ "mean_token_accuracy": 0.6531546950340271,
1375
+ "num_tokens": 1618906.0,
1376
+ "step": 1440
1377
+ },
1378
+ {
1379
+ "epoch": 1.2888888888888888,
1380
+ "grad_norm": 1.7857719659805298,
1381
+ "learning_rate": 0.0001176793893129771,
1382
+ "loss": 1.7887,
1383
+ "mean_token_accuracy": 0.6610255971550941,
1384
+ "num_tokens": 1629981.0,
1385
+ "step": 1450
1386
+ },
1387
+ {
1388
+ "epoch": 1.2977777777777777,
1389
+ "grad_norm": 1.8434971570968628,
1390
+ "learning_rate": 0.00011706870229007634,
1391
+ "loss": 1.8368,
1392
+ "mean_token_accuracy": 0.653369964659214,
1393
+ "num_tokens": 1641429.0,
1394
+ "step": 1460
1395
+ },
1396
+ {
1397
+ "epoch": 1.3066666666666666,
1398
+ "grad_norm": 1.8877320289611816,
1399
+ "learning_rate": 0.00011645801526717557,
1400
+ "loss": 1.7938,
1401
+ "mean_token_accuracy": 0.6639183640480042,
1402
+ "num_tokens": 1652601.0,
1403
+ "step": 1470
1404
+ },
1405
+ {
1406
+ "epoch": 1.3155555555555556,
1407
+ "grad_norm": 1.8121625185012817,
1408
+ "learning_rate": 0.00011584732824427482,
1409
+ "loss": 1.7862,
1410
+ "mean_token_accuracy": 0.661414910852909,
1411
+ "num_tokens": 1663837.0,
1412
+ "step": 1480
1413
+ },
1414
+ {
1415
+ "epoch": 1.3244444444444445,
1416
+ "grad_norm": 1.7919855117797852,
1417
+ "learning_rate": 0.00011523664122137406,
1418
+ "loss": 1.8148,
1419
+ "mean_token_accuracy": 0.6654411420226097,
1420
+ "num_tokens": 1675018.0,
1421
+ "step": 1490
1422
+ },
1423
+ {
1424
+ "epoch": 1.3333333333333333,
1425
+ "grad_norm": 1.828735589981079,
1426
+ "learning_rate": 0.00011462595419847328,
1427
+ "loss": 1.8456,
1428
+ "mean_token_accuracy": 0.6496043875813484,
1429
+ "num_tokens": 1686136.0,
1430
+ "step": 1500
1431
+ },
1432
+ {
1433
+ "epoch": 1.3422222222222222,
1434
+ "grad_norm": 1.9462794065475464,
1435
+ "learning_rate": 0.00011401526717557252,
1436
+ "loss": 1.8412,
1437
+ "mean_token_accuracy": 0.6603908941149712,
1438
+ "num_tokens": 1697160.0,
1439
+ "step": 1510
1440
+ },
1441
+ {
1442
+ "epoch": 1.3511111111111112,
1443
+ "grad_norm": 1.6794313192367554,
1444
+ "learning_rate": 0.00011340458015267177,
1445
+ "loss": 1.7774,
1446
+ "mean_token_accuracy": 0.6664682924747467,
1447
+ "num_tokens": 1707831.0,
1448
+ "step": 1520
1449
+ },
1450
+ {
1451
+ "epoch": 1.3599999999999999,
1452
+ "grad_norm": 1.8189337253570557,
1453
+ "learning_rate": 0.000112793893129771,
1454
+ "loss": 1.8031,
1455
+ "mean_token_accuracy": 0.6627006307244301,
1456
+ "num_tokens": 1719074.0,
1457
+ "step": 1530
1458
+ },
1459
+ {
1460
+ "epoch": 1.3688888888888888,
1461
+ "grad_norm": 2.073533296585083,
1462
+ "learning_rate": 0.00011218320610687022,
1463
+ "loss": 1.8657,
1464
+ "mean_token_accuracy": 0.6476830393075943,
1465
+ "num_tokens": 1730388.0,
1466
+ "step": 1540
1467
+ },
1468
+ {
1469
+ "epoch": 1.3777777777777778,
1470
+ "grad_norm": 2.1564207077026367,
1471
+ "learning_rate": 0.00011157251908396946,
1472
+ "loss": 1.8261,
1473
+ "mean_token_accuracy": 0.6567840203642845,
1474
+ "num_tokens": 1741806.0,
1475
+ "step": 1550
1476
+ },
1477
+ {
1478
+ "epoch": 1.3866666666666667,
1479
+ "grad_norm": 1.6113232374191284,
1480
+ "learning_rate": 0.00011096183206106871,
1481
+ "loss": 1.7753,
1482
+ "mean_token_accuracy": 0.6659888163208961,
1483
+ "num_tokens": 1753313.0,
1484
+ "step": 1560
1485
+ },
1486
+ {
1487
+ "epoch": 1.3955555555555557,
1488
+ "grad_norm": 1.8112174272537231,
1489
+ "learning_rate": 0.00011035114503816795,
1490
+ "loss": 1.8046,
1491
+ "mean_token_accuracy": 0.6593015149235726,
1492
+ "num_tokens": 1765144.0,
1493
+ "step": 1570
1494
+ },
1495
+ {
1496
+ "epoch": 1.4044444444444444,
1497
+ "grad_norm": 1.8377541303634644,
1498
+ "learning_rate": 0.00010974045801526718,
1499
+ "loss": 1.8848,
1500
+ "mean_token_accuracy": 0.6533517614006996,
1501
+ "num_tokens": 1776783.0,
1502
+ "step": 1580
1503
+ },
1504
+ {
1505
+ "epoch": 1.4133333333333333,
1506
+ "grad_norm": 1.8384325504302979,
1507
+ "learning_rate": 0.0001091297709923664,
1508
+ "loss": 1.7669,
1509
+ "mean_token_accuracy": 0.6613995045423507,
1510
+ "num_tokens": 1788274.0,
1511
+ "step": 1590
1512
+ },
1513
+ {
1514
+ "epoch": 1.4222222222222223,
1515
+ "grad_norm": 1.8124533891677856,
1516
+ "learning_rate": 0.00010851908396946567,
1517
+ "loss": 1.8164,
1518
+ "mean_token_accuracy": 0.6591159239411354,
1519
+ "num_tokens": 1799707.0,
1520
+ "step": 1600
1521
+ },
1522
+ {
1523
+ "epoch": 1.4222222222222223,
1524
+ "eval_loss": 1.9286668300628662,
1525
+ "eval_mean_token_accuracy": 0.6434953879117966,
1526
+ "eval_num_tokens": 1799707.0,
1527
+ "eval_runtime": 48.6198,
1528
+ "eval_samples_per_second": 20.568,
1529
+ "eval_steps_per_second": 10.284,
1530
+ "step": 1600
1531
+ },
1532
+ {
1533
+ "epoch": 1.431111111111111,
1534
+ "grad_norm": 1.6931661367416382,
1535
+ "learning_rate": 0.00010790839694656489,
1536
+ "loss": 1.7548,
1537
+ "mean_token_accuracy": 0.664087076485157,
1538
+ "num_tokens": 1810865.0,
1539
+ "step": 1610
1540
+ },
1541
+ {
1542
+ "epoch": 1.44,
1543
+ "grad_norm": 1.7501254081726074,
1544
+ "learning_rate": 0.00010729770992366413,
1545
+ "loss": 1.7652,
1546
+ "mean_token_accuracy": 0.6640020117163659,
1547
+ "num_tokens": 1821807.0,
1548
+ "step": 1620
1549
+ },
1550
+ {
1551
+ "epoch": 1.448888888888889,
1552
+ "grad_norm": 1.8411732912063599,
1553
+ "learning_rate": 0.00010668702290076336,
1554
+ "loss": 1.831,
1555
+ "mean_token_accuracy": 0.6564242169260979,
1556
+ "num_tokens": 1832886.0,
1557
+ "step": 1630
1558
+ },
1559
+ {
1560
+ "epoch": 1.4577777777777778,
1561
+ "grad_norm": 2.003892183303833,
1562
+ "learning_rate": 0.00010607633587786261,
1563
+ "loss": 1.7791,
1564
+ "mean_token_accuracy": 0.6632592365145683,
1565
+ "num_tokens": 1843989.0,
1566
+ "step": 1640
1567
+ },
1568
+ {
1569
+ "epoch": 1.4666666666666668,
1570
+ "grad_norm": 1.7987340688705444,
1571
+ "learning_rate": 0.00010546564885496185,
1572
+ "loss": 1.7627,
1573
+ "mean_token_accuracy": 0.6713873609900475,
1574
+ "num_tokens": 1855106.0,
1575
+ "step": 1650
1576
+ },
1577
+ {
1578
+ "epoch": 1.4755555555555555,
1579
+ "grad_norm": 1.931877851486206,
1580
+ "learning_rate": 0.00010485496183206107,
1581
+ "loss": 1.7976,
1582
+ "mean_token_accuracy": 0.6631382897496223,
1583
+ "num_tokens": 1866900.0,
1584
+ "step": 1660
1585
+ },
1586
+ {
1587
+ "epoch": 1.4844444444444445,
1588
+ "grad_norm": 1.7883687019348145,
1589
+ "learning_rate": 0.0001042442748091603,
1590
+ "loss": 1.7671,
1591
+ "mean_token_accuracy": 0.6675158813595772,
1592
+ "num_tokens": 1877911.0,
1593
+ "step": 1670
1594
+ },
1595
+ {
1596
+ "epoch": 1.4933333333333334,
1597
+ "grad_norm": 1.8195563554763794,
1598
+ "learning_rate": 0.00010363358778625955,
1599
+ "loss": 1.8346,
1600
+ "mean_token_accuracy": 0.652577318251133,
1601
+ "num_tokens": 1889580.0,
1602
+ "step": 1680
1603
+ },
1604
+ {
1605
+ "epoch": 1.5022222222222221,
1606
+ "grad_norm": 1.7439149618148804,
1607
+ "learning_rate": 0.00010302290076335879,
1608
+ "loss": 1.7476,
1609
+ "mean_token_accuracy": 0.6717594474554062,
1610
+ "num_tokens": 1901133.0,
1611
+ "step": 1690
1612
+ },
1613
+ {
1614
+ "epoch": 1.511111111111111,
1615
+ "grad_norm": 1.8155314922332764,
1616
+ "learning_rate": 0.00010241221374045801,
1617
+ "loss": 1.8044,
1618
+ "mean_token_accuracy": 0.6617274522781372,
1619
+ "num_tokens": 1911796.0,
1620
+ "step": 1700
1621
+ },
1622
+ {
1623
+ "epoch": 1.52,
1624
+ "grad_norm": 1.7685112953186035,
1625
+ "learning_rate": 0.00010180152671755725,
1626
+ "loss": 1.7727,
1627
+ "mean_token_accuracy": 0.665304908156395,
1628
+ "num_tokens": 1923217.0,
1629
+ "step": 1710
1630
+ },
1631
+ {
1632
+ "epoch": 1.528888888888889,
1633
+ "grad_norm": 1.737053632736206,
1634
+ "learning_rate": 0.0001011908396946565,
1635
+ "loss": 1.8345,
1636
+ "mean_token_accuracy": 0.6577870160341263,
1637
+ "num_tokens": 1934355.0,
1638
+ "step": 1720
1639
+ },
1640
+ {
1641
+ "epoch": 1.537777777777778,
1642
+ "grad_norm": 1.9686291217803955,
1643
+ "learning_rate": 0.00010058015267175573,
1644
+ "loss": 1.8165,
1645
+ "mean_token_accuracy": 0.6594037398695946,
1646
+ "num_tokens": 1945653.0,
1647
+ "step": 1730
1648
+ },
1649
+ {
1650
+ "epoch": 1.5466666666666666,
1651
+ "grad_norm": 1.844651699066162,
1652
+ "learning_rate": 9.996946564885497e-05,
1653
+ "loss": 1.8273,
1654
+ "mean_token_accuracy": 0.6566928923130035,
1655
+ "num_tokens": 1956891.0,
1656
+ "step": 1740
1657
+ },
1658
+ {
1659
+ "epoch": 1.5555555555555556,
1660
+ "grad_norm": 1.8607743978500366,
1661
+ "learning_rate": 9.93587786259542e-05,
1662
+ "loss": 1.785,
1663
+ "mean_token_accuracy": 0.6692357853055,
1664
+ "num_tokens": 1967789.0,
1665
+ "step": 1750
1666
+ },
1667
+ {
1668
+ "epoch": 1.5644444444444443,
1669
+ "grad_norm": 1.9204373359680176,
1670
+ "learning_rate": 9.874809160305344e-05,
1671
+ "loss": 1.8264,
1672
+ "mean_token_accuracy": 0.6549209818243981,
1673
+ "num_tokens": 1979224.0,
1674
+ "step": 1760
1675
+ },
1676
+ {
1677
+ "epoch": 1.5733333333333333,
1678
+ "grad_norm": 1.7754265069961548,
1679
+ "learning_rate": 9.813740458015268e-05,
1680
+ "loss": 1.7467,
1681
+ "mean_token_accuracy": 0.6670090600848197,
1682
+ "num_tokens": 1990255.0,
1683
+ "step": 1770
1684
+ },
1685
+ {
1686
+ "epoch": 1.5822222222222222,
1687
+ "grad_norm": 2.069091796875,
1688
+ "learning_rate": 9.752671755725191e-05,
1689
+ "loss": 1.7731,
1690
+ "mean_token_accuracy": 0.6609751120209694,
1691
+ "num_tokens": 2001606.0,
1692
+ "step": 1780
1693
+ },
1694
+ {
1695
+ "epoch": 1.5911111111111111,
1696
+ "grad_norm": 2.1375646591186523,
1697
+ "learning_rate": 9.691603053435115e-05,
1698
+ "loss": 1.8009,
1699
+ "mean_token_accuracy": 0.6624869346618653,
1700
+ "num_tokens": 2012912.0,
1701
+ "step": 1790
1702
+ },
1703
+ {
1704
+ "epoch": 1.6,
1705
+ "grad_norm": 1.5623434782028198,
1706
+ "learning_rate": 9.630534351145038e-05,
1707
+ "loss": 1.7383,
1708
+ "mean_token_accuracy": 0.6694582119584084,
1709
+ "num_tokens": 2024571.0,
1710
+ "step": 1800
1711
+ },
1712
+ {
1713
+ "epoch": 1.6,
1714
+ "eval_loss": 1.90510892868042,
1715
+ "eval_mean_token_accuracy": 0.6464553346633911,
1716
+ "eval_num_tokens": 2024571.0,
1717
+ "eval_runtime": 48.9449,
1718
+ "eval_samples_per_second": 20.431,
1719
+ "eval_steps_per_second": 10.216,
1720
+ "step": 1800
1721
+ },
1722
+ {
1723
+ "epoch": 1.608888888888889,
1724
+ "grad_norm": 1.745969295501709,
1725
+ "learning_rate": 9.569465648854963e-05,
1726
+ "loss": 1.7552,
1727
+ "mean_token_accuracy": 0.6786300778388977,
1728
+ "num_tokens": 2035783.0,
1729
+ "step": 1810
1730
+ },
1731
+ {
1732
+ "epoch": 1.6177777777777778,
1733
+ "grad_norm": 1.7463303804397583,
1734
+ "learning_rate": 9.508396946564886e-05,
1735
+ "loss": 1.7495,
1736
+ "mean_token_accuracy": 0.6666959136724472,
1737
+ "num_tokens": 2047304.0,
1738
+ "step": 1820
1739
+ },
1740
+ {
1741
+ "epoch": 1.6266666666666667,
1742
+ "grad_norm": 1.9058139324188232,
1743
+ "learning_rate": 9.44732824427481e-05,
1744
+ "loss": 1.8365,
1745
+ "mean_token_accuracy": 0.6536470741033554,
1746
+ "num_tokens": 2058792.0,
1747
+ "step": 1830
1748
+ },
1749
+ {
1750
+ "epoch": 1.6355555555555554,
1751
+ "grad_norm": 2.065488576889038,
1752
+ "learning_rate": 9.386259541984733e-05,
1753
+ "loss": 1.7939,
1754
+ "mean_token_accuracy": 0.6519258007407188,
1755
+ "num_tokens": 2070175.0,
1756
+ "step": 1840
1757
+ },
1758
+ {
1759
+ "epoch": 1.6444444444444444,
1760
+ "grad_norm": 1.778023600578308,
1761
+ "learning_rate": 9.325190839694658e-05,
1762
+ "loss": 1.8155,
1763
+ "mean_token_accuracy": 0.655296416580677,
1764
+ "num_tokens": 2081343.0,
1765
+ "step": 1850
1766
+ },
1767
+ {
1768
+ "epoch": 1.6533333333333333,
1769
+ "grad_norm": 1.7437517642974854,
1770
+ "learning_rate": 9.26412213740458e-05,
1771
+ "loss": 1.7996,
1772
+ "mean_token_accuracy": 0.6618543311953544,
1773
+ "num_tokens": 2093074.0,
1774
+ "step": 1860
1775
+ },
1776
+ {
1777
+ "epoch": 1.6622222222222223,
1778
+ "grad_norm": 1.7666471004486084,
1779
+ "learning_rate": 9.203053435114505e-05,
1780
+ "loss": 1.7658,
1781
+ "mean_token_accuracy": 0.6631957843899727,
1782
+ "num_tokens": 2104640.0,
1783
+ "step": 1870
1784
+ },
1785
+ {
1786
+ "epoch": 1.6711111111111112,
1787
+ "grad_norm": 1.912842869758606,
1788
+ "learning_rate": 9.141984732824428e-05,
1789
+ "loss": 1.7996,
1790
+ "mean_token_accuracy": 0.6606781020760536,
1791
+ "num_tokens": 2115628.0,
1792
+ "step": 1880
1793
+ },
1794
+ {
1795
+ "epoch": 1.6800000000000002,
1796
+ "grad_norm": 1.7230331897735596,
1797
+ "learning_rate": 9.080916030534351e-05,
1798
+ "loss": 1.8042,
1799
+ "mean_token_accuracy": 0.6600380197167397,
1800
+ "num_tokens": 2126505.0,
1801
+ "step": 1890
1802
+ },
1803
+ {
1804
+ "epoch": 1.6888888888888889,
1805
+ "grad_norm": 1.7043401002883911,
1806
+ "learning_rate": 9.019847328244276e-05,
1807
+ "loss": 1.7993,
1808
+ "mean_token_accuracy": 0.6613149493932724,
1809
+ "num_tokens": 2138364.0,
1810
+ "step": 1900
1811
+ },
1812
+ {
1813
+ "epoch": 1.6977777777777778,
1814
+ "grad_norm": 1.9145572185516357,
1815
+ "learning_rate": 8.958778625954198e-05,
1816
+ "loss": 1.8046,
1817
+ "mean_token_accuracy": 0.662477345764637,
1818
+ "num_tokens": 2149425.0,
1819
+ "step": 1910
1820
+ },
1821
+ {
1822
+ "epoch": 1.7066666666666666,
1823
+ "grad_norm": 1.7448140382766724,
1824
+ "learning_rate": 8.897709923664123e-05,
1825
+ "loss": 1.8004,
1826
+ "mean_token_accuracy": 0.6539181426167489,
1827
+ "num_tokens": 2160843.0,
1828
+ "step": 1920
1829
+ },
1830
+ {
1831
+ "epoch": 1.7155555555555555,
1832
+ "grad_norm": 1.8304840326309204,
1833
+ "learning_rate": 8.836641221374045e-05,
1834
+ "loss": 1.8404,
1835
+ "mean_token_accuracy": 0.6593489304184914,
1836
+ "num_tokens": 2172044.0,
1837
+ "step": 1930
1838
+ },
1839
+ {
1840
+ "epoch": 1.7244444444444444,
1841
+ "grad_norm": 1.802331566810608,
1842
+ "learning_rate": 8.77557251908397e-05,
1843
+ "loss": 1.7995,
1844
+ "mean_token_accuracy": 0.6634193584322929,
1845
+ "num_tokens": 2182916.0,
1846
+ "step": 1940
1847
+ },
1848
+ {
1849
+ "epoch": 1.7333333333333334,
1850
+ "grad_norm": 1.9834682941436768,
1851
+ "learning_rate": 8.714503816793894e-05,
1852
+ "loss": 1.7525,
1853
+ "mean_token_accuracy": 0.6685526207089424,
1854
+ "num_tokens": 2194913.0,
1855
+ "step": 1950
1856
+ },
1857
+ {
1858
+ "epoch": 1.7422222222222223,
1859
+ "grad_norm": 1.8077235221862793,
1860
+ "learning_rate": 8.653435114503817e-05,
1861
+ "loss": 1.7612,
1862
+ "mean_token_accuracy": 0.6704939991235733,
1863
+ "num_tokens": 2205721.0,
1864
+ "step": 1960
1865
+ },
1866
+ {
1867
+ "epoch": 1.751111111111111,
1868
+ "grad_norm": 1.957993745803833,
1869
+ "learning_rate": 8.592366412213741e-05,
1870
+ "loss": 1.8059,
1871
+ "mean_token_accuracy": 0.6547697961330414,
1872
+ "num_tokens": 2217489.0,
1873
+ "step": 1970
1874
+ },
1875
+ {
1876
+ "epoch": 1.76,
1877
+ "grad_norm": 1.7215981483459473,
1878
+ "learning_rate": 8.531297709923664e-05,
1879
+ "loss": 1.7913,
1880
+ "mean_token_accuracy": 0.657075221836567,
1881
+ "num_tokens": 2228972.0,
1882
+ "step": 1980
1883
+ },
1884
+ {
1885
+ "epoch": 1.7688888888888887,
1886
+ "grad_norm": 1.8760231733322144,
1887
+ "learning_rate": 8.470229007633588e-05,
1888
+ "loss": 1.7923,
1889
+ "mean_token_accuracy": 0.6629065066576004,
1890
+ "num_tokens": 2240239.0,
1891
+ "step": 1990
1892
+ },
1893
+ {
1894
+ "epoch": 1.7777777777777777,
1895
+ "grad_norm": 2.092407703399658,
1896
+ "learning_rate": 8.409160305343512e-05,
1897
+ "loss": 1.7593,
1898
+ "mean_token_accuracy": 0.6686230883002281,
1899
+ "num_tokens": 2251436.0,
1900
+ "step": 2000
1901
+ },
1902
+ {
1903
+ "epoch": 1.7777777777777777,
1904
+ "eval_loss": 1.893255591392517,
1905
+ "eval_mean_token_accuracy": 0.6482590944766998,
1906
+ "eval_num_tokens": 2251436.0,
1907
+ "eval_runtime": 49.0676,
1908
+ "eval_samples_per_second": 20.38,
1909
+ "eval_steps_per_second": 10.19,
1910
+ "step": 2000
1911
+ },
1912
+ {
1913
+ "epoch": 1.7866666666666666,
1914
+ "grad_norm": 1.7836107015609741,
1915
+ "learning_rate": 8.348091603053435e-05,
1916
+ "loss": 1.8033,
1917
+ "mean_token_accuracy": 0.6598399996757507,
1918
+ "num_tokens": 2263069.0,
1919
+ "step": 2010
1920
+ },
1921
+ {
1922
+ "epoch": 1.7955555555555556,
1923
+ "grad_norm": 1.7955141067504883,
1924
+ "learning_rate": 8.287022900763359e-05,
1925
+ "loss": 1.7922,
1926
+ "mean_token_accuracy": 0.6619856491684913,
1927
+ "num_tokens": 2274050.0,
1928
+ "step": 2020
1929
+ },
1930
+ {
1931
+ "epoch": 1.8044444444444445,
1932
+ "grad_norm": 1.7887564897537231,
1933
+ "learning_rate": 8.225954198473282e-05,
1934
+ "loss": 1.8353,
1935
+ "mean_token_accuracy": 0.658150726556778,
1936
+ "num_tokens": 2285060.0,
1937
+ "step": 2030
1938
+ },
1939
+ {
1940
+ "epoch": 1.8133333333333335,
1941
+ "grad_norm": 1.8892567157745361,
1942
+ "learning_rate": 8.164885496183207e-05,
1943
+ "loss": 1.7266,
1944
+ "mean_token_accuracy": 0.6728688895702362,
1945
+ "num_tokens": 2296211.0,
1946
+ "step": 2040
1947
+ },
1948
+ {
1949
+ "epoch": 1.8222222222222222,
1950
+ "grad_norm": 1.9226106405258179,
1951
+ "learning_rate": 8.10381679389313e-05,
1952
+ "loss": 1.7243,
1953
+ "mean_token_accuracy": 0.6712497785687447,
1954
+ "num_tokens": 2307184.0,
1955
+ "step": 2050
1956
+ },
1957
+ {
1958
+ "epoch": 1.8311111111111111,
1959
+ "grad_norm": 1.735863208770752,
1960
+ "learning_rate": 8.042748091603054e-05,
1961
+ "loss": 1.7739,
1962
+ "mean_token_accuracy": 0.6621047109365463,
1963
+ "num_tokens": 2318602.0,
1964
+ "step": 2060
1965
+ },
1966
+ {
1967
+ "epoch": 1.8399999999999999,
1968
+ "grad_norm": 1.8361355066299438,
1969
+ "learning_rate": 7.981679389312977e-05,
1970
+ "loss": 1.8223,
1971
+ "mean_token_accuracy": 0.6560095950961113,
1972
+ "num_tokens": 2330193.0,
1973
+ "step": 2070
1974
+ },
1975
+ {
1976
+ "epoch": 1.8488888888888888,
1977
+ "grad_norm": 1.8159486055374146,
1978
+ "learning_rate": 7.920610687022902e-05,
1979
+ "loss": 1.7695,
1980
+ "mean_token_accuracy": 0.6657541528344154,
1981
+ "num_tokens": 2341442.0,
1982
+ "step": 2080
1983
+ },
1984
+ {
1985
+ "epoch": 1.8577777777777778,
1986
+ "grad_norm": 1.9189419746398926,
1987
+ "learning_rate": 7.859541984732824e-05,
1988
+ "loss": 1.8333,
1989
+ "mean_token_accuracy": 0.6628425523638726,
1990
+ "num_tokens": 2352479.0,
1991
+ "step": 2090
1992
+ },
1993
+ {
1994
+ "epoch": 1.8666666666666667,
1995
+ "grad_norm": 1.8809512853622437,
1996
+ "learning_rate": 7.798473282442749e-05,
1997
+ "loss": 1.7371,
1998
+ "mean_token_accuracy": 0.6683435723185539,
1999
+ "num_tokens": 2363642.0,
2000
+ "step": 2100
2001
+ },
2002
+ {
2003
+ "epoch": 1.8755555555555556,
2004
+ "grad_norm": 1.845886468887329,
2005
+ "learning_rate": 7.737404580152672e-05,
2006
+ "loss": 1.7774,
2007
+ "mean_token_accuracy": 0.6559944331645966,
2008
+ "num_tokens": 2375376.0,
2009
+ "step": 2110
2010
+ },
2011
+ {
2012
+ "epoch": 1.8844444444444446,
2013
+ "grad_norm": 1.7780894041061401,
2014
+ "learning_rate": 7.676335877862596e-05,
2015
+ "loss": 1.7823,
2016
+ "mean_token_accuracy": 0.6601730152964592,
2017
+ "num_tokens": 2386944.0,
2018
+ "step": 2120
2019
+ },
2020
+ {
2021
+ "epoch": 1.8933333333333333,
2022
+ "grad_norm": 1.9167022705078125,
2023
+ "learning_rate": 7.61526717557252e-05,
2024
+ "loss": 1.7869,
2025
+ "mean_token_accuracy": 0.6573449537158013,
2026
+ "num_tokens": 2398391.0,
2027
+ "step": 2130
2028
+ },
2029
+ {
2030
+ "epoch": 1.9022222222222223,
2031
+ "grad_norm": 2.037911891937256,
2032
+ "learning_rate": 7.554198473282443e-05,
2033
+ "loss": 1.7858,
2034
+ "mean_token_accuracy": 0.6593190267682075,
2035
+ "num_tokens": 2409837.0,
2036
+ "step": 2140
2037
+ },
2038
+ {
2039
+ "epoch": 1.911111111111111,
2040
+ "grad_norm": 1.7496647834777832,
2041
+ "learning_rate": 7.493129770992367e-05,
2042
+ "loss": 1.7241,
2043
+ "mean_token_accuracy": 0.6702290028333664,
2044
+ "num_tokens": 2421607.0,
2045
+ "step": 2150
2046
+ },
2047
+ {
2048
+ "epoch": 1.92,
2049
+ "grad_norm": 2.0227596759796143,
2050
+ "learning_rate": 7.43206106870229e-05,
2051
+ "loss": 1.7731,
2052
+ "mean_token_accuracy": 0.6679618924856185,
2053
+ "num_tokens": 2432376.0,
2054
+ "step": 2160
2055
+ },
2056
+ {
2057
+ "epoch": 1.9288888888888889,
2058
+ "grad_norm": 1.7401562929153442,
2059
+ "learning_rate": 7.370992366412214e-05,
2060
+ "loss": 1.7684,
2061
+ "mean_token_accuracy": 0.6676609605550766,
2062
+ "num_tokens": 2443683.0,
2063
+ "step": 2170
2064
+ },
2065
+ {
2066
+ "epoch": 1.9377777777777778,
2067
+ "grad_norm": 2.709106922149658,
2068
+ "learning_rate": 7.309923664122137e-05,
2069
+ "loss": 1.709,
2070
+ "mean_token_accuracy": 0.6738818466663361,
2071
+ "num_tokens": 2454757.0,
2072
+ "step": 2180
2073
+ },
2074
+ {
2075
+ "epoch": 1.9466666666666668,
2076
+ "grad_norm": 1.8504191637039185,
2077
+ "learning_rate": 7.248854961832061e-05,
2078
+ "loss": 1.7411,
2079
+ "mean_token_accuracy": 0.6681609645485878,
2080
+ "num_tokens": 2465562.0,
2081
+ "step": 2190
2082
+ },
2083
+ {
2084
+ "epoch": 1.9555555555555557,
2085
+ "grad_norm": 1.9488162994384766,
2086
+ "learning_rate": 7.187786259541986e-05,
2087
+ "loss": 1.7927,
2088
+ "mean_token_accuracy": 0.6587553441524505,
2089
+ "num_tokens": 2476869.0,
2090
+ "step": 2200
2091
+ },
2092
+ {
2093
+ "epoch": 1.9555555555555557,
2094
+ "eval_loss": 1.8803235292434692,
2095
+ "eval_mean_token_accuracy": 0.6499251070022583,
2096
+ "eval_num_tokens": 2476869.0,
2097
+ "eval_runtime": 47.7648,
2098
+ "eval_samples_per_second": 20.936,
2099
+ "eval_steps_per_second": 10.468,
2100
+ "step": 2200
2101
+ },
2102
+ {
2103
+ "epoch": 1.9644444444444444,
2104
+ "grad_norm": 1.9747337102890015,
2105
+ "learning_rate": 7.132824427480917e-05,
2106
+ "loss": 1.7689,
2107
+ "mean_token_accuracy": 0.666295376420021,
2108
+ "num_tokens": 2487704.0,
2109
+ "step": 2210
2110
+ },
2111
+ {
2112
+ "epoch": 1.9733333333333334,
2113
+ "grad_norm": 1.8904316425323486,
2114
+ "learning_rate": 7.071755725190839e-05,
2115
+ "loss": 1.7538,
2116
+ "mean_token_accuracy": 0.6645636394619941,
2117
+ "num_tokens": 2498918.0,
2118
+ "step": 2220
2119
+ },
2120
+ {
2121
+ "epoch": 1.982222222222222,
2122
+ "grad_norm": 1.8791844844818115,
2123
+ "learning_rate": 7.010687022900764e-05,
2124
+ "loss": 1.7926,
2125
+ "mean_token_accuracy": 0.6631673067808151,
2126
+ "num_tokens": 2509728.0,
2127
+ "step": 2230
2128
+ },
2129
+ {
2130
+ "epoch": 1.991111111111111,
2131
+ "grad_norm": 1.9756606817245483,
2132
+ "learning_rate": 6.949618320610687e-05,
2133
+ "loss": 1.7863,
2134
+ "mean_token_accuracy": 0.6628521859645844,
2135
+ "num_tokens": 2521073.0,
2136
+ "step": 2240
2137
+ },
2138
+ {
2139
+ "epoch": 2.0,
2140
+ "grad_norm": 1.7894699573516846,
2141
+ "learning_rate": 6.888549618320611e-05,
2142
+ "loss": 1.7539,
2143
+ "mean_token_accuracy": 0.6728802308440208,
2144
+ "num_tokens": 2531820.0,
2145
+ "step": 2250
2146
+ },
2147
+ {
2148
+ "epoch": 2.008888888888889,
2149
+ "grad_norm": 1.702850341796875,
2150
+ "learning_rate": 6.827480916030535e-05,
2151
+ "loss": 1.4903,
2152
+ "mean_token_accuracy": 0.7138098135590554,
2153
+ "num_tokens": 2542512.0,
2154
+ "step": 2260
2155
+ },
2156
+ {
2157
+ "epoch": 2.017777777777778,
2158
+ "grad_norm": 1.7931528091430664,
2159
+ "learning_rate": 6.766412213740458e-05,
2160
+ "loss": 1.601,
2161
+ "mean_token_accuracy": 0.6894692406058311,
2162
+ "num_tokens": 2553338.0,
2163
+ "step": 2270
2164
+ },
2165
+ {
2166
+ "epoch": 2.026666666666667,
2167
+ "grad_norm": 2.228480339050293,
2168
+ "learning_rate": 6.705343511450382e-05,
2169
+ "loss": 1.609,
2170
+ "mean_token_accuracy": 0.6943154886364937,
2171
+ "num_tokens": 2564182.0,
2172
+ "step": 2280
2173
+ },
2174
+ {
2175
+ "epoch": 2.0355555555555553,
2176
+ "grad_norm": 1.9658042192459106,
2177
+ "learning_rate": 6.644274809160305e-05,
2178
+ "loss": 1.6545,
2179
+ "mean_token_accuracy": 0.6824306204915047,
2180
+ "num_tokens": 2575789.0,
2181
+ "step": 2290
2182
+ },
2183
+ {
2184
+ "epoch": 2.0444444444444443,
2185
+ "grad_norm": 1.7540594339370728,
2186
+ "learning_rate": 6.583206106870229e-05,
2187
+ "loss": 1.6229,
2188
+ "mean_token_accuracy": 0.6881745710968972,
2189
+ "num_tokens": 2587147.0,
2190
+ "step": 2300
2191
+ },
2192
+ {
2193
+ "epoch": 2.0533333333333332,
2194
+ "grad_norm": 1.799501895904541,
2195
+ "learning_rate": 6.522137404580153e-05,
2196
+ "loss": 1.6119,
2197
+ "mean_token_accuracy": 0.6896049126982688,
2198
+ "num_tokens": 2598282.0,
2199
+ "step": 2310
2200
+ },
2201
+ {
2202
+ "epoch": 2.062222222222222,
2203
+ "grad_norm": 1.7720867395401,
2204
+ "learning_rate": 6.461068702290076e-05,
2205
+ "loss": 1.5519,
2206
+ "mean_token_accuracy": 0.7038252353668213,
2207
+ "num_tokens": 2609125.0,
2208
+ "step": 2320
2209
+ },
2210
+ {
2211
+ "epoch": 2.071111111111111,
2212
+ "grad_norm": 1.994992971420288,
2213
+ "learning_rate": 6.400000000000001e-05,
2214
+ "loss": 1.5872,
2215
+ "mean_token_accuracy": 0.690100908279419,
2216
+ "num_tokens": 2620411.0,
2217
+ "step": 2330
2218
+ },
2219
+ {
2220
+ "epoch": 2.08,
2221
+ "grad_norm": 1.9283640384674072,
2222
+ "learning_rate": 6.338931297709923e-05,
2223
+ "loss": 1.5867,
2224
+ "mean_token_accuracy": 0.6923216238617897,
2225
+ "num_tokens": 2631795.0,
2226
+ "step": 2340
2227
+ },
2228
+ {
2229
+ "epoch": 2.088888888888889,
2230
+ "grad_norm": 1.9957973957061768,
2231
+ "learning_rate": 6.277862595419848e-05,
2232
+ "loss": 1.5996,
2233
+ "mean_token_accuracy": 0.6924369186162949,
2234
+ "num_tokens": 2643179.0,
2235
+ "step": 2350
2236
+ },
2237
+ {
2238
+ "epoch": 2.097777777777778,
2239
+ "grad_norm": 2.0207560062408447,
2240
+ "learning_rate": 6.21679389312977e-05,
2241
+ "loss": 1.515,
2242
+ "mean_token_accuracy": 0.7066755428910255,
2243
+ "num_tokens": 2654206.0,
2244
+ "step": 2360
2245
+ },
2246
+ {
2247
+ "epoch": 2.1066666666666665,
2248
+ "grad_norm": 1.8871878385543823,
2249
+ "learning_rate": 6.155725190839695e-05,
2250
+ "loss": 1.6139,
2251
+ "mean_token_accuracy": 0.687422800064087,
2252
+ "num_tokens": 2665582.0,
2253
+ "step": 2370
2254
+ },
2255
+ {
2256
+ "epoch": 2.1155555555555554,
2257
+ "grad_norm": 1.717610478401184,
2258
+ "learning_rate": 6.094656488549618e-05,
2259
+ "loss": 1.6388,
2260
+ "mean_token_accuracy": 0.6870575189590454,
2261
+ "num_tokens": 2677533.0,
2262
+ "step": 2380
2263
+ },
2264
+ {
2265
+ "epoch": 2.1244444444444444,
2266
+ "grad_norm": 1.8574187755584717,
2267
+ "learning_rate": 6.0335877862595426e-05,
2268
+ "loss": 1.557,
2269
+ "mean_token_accuracy": 0.6999430671334267,
2270
+ "num_tokens": 2688755.0,
2271
+ "step": 2390
2272
+ },
2273
+ {
2274
+ "epoch": 2.1333333333333333,
2275
+ "grad_norm": 1.9739580154418945,
2276
+ "learning_rate": 5.9725190839694655e-05,
2277
+ "loss": 1.6553,
2278
+ "mean_token_accuracy": 0.6819543272256852,
2279
+ "num_tokens": 2700558.0,
2280
+ "step": 2400
2281
+ },
2282
+ {
2283
+ "epoch": 2.1333333333333333,
2284
+ "eval_loss": 1.8970768451690674,
2285
+ "eval_mean_token_accuracy": 0.6490416256189346,
2286
+ "eval_num_tokens": 2700558.0,
2287
+ "eval_runtime": 47.6704,
2288
+ "eval_samples_per_second": 20.977,
2289
+ "eval_steps_per_second": 10.489,
2290
+ "step": 2400
2291
+ },
2292
+ {
2293
+ "epoch": 2.1422222222222222,
2294
+ "grad_norm": 1.893918514251709,
2295
+ "learning_rate": 5.91145038167939e-05,
2296
+ "loss": 1.5459,
2297
+ "mean_token_accuracy": 0.6963777393102646,
2298
+ "num_tokens": 2711713.0,
2299
+ "step": 2410
2300
+ },
2301
+ {
2302
+ "epoch": 2.151111111111111,
2303
+ "grad_norm": 1.9607445001602173,
2304
+ "learning_rate": 5.850381679389313e-05,
2305
+ "loss": 1.6373,
2306
+ "mean_token_accuracy": 0.6815788432955742,
2307
+ "num_tokens": 2723686.0,
2308
+ "step": 2420
2309
+ },
2310
+ {
2311
+ "epoch": 2.16,
2312
+ "grad_norm": 2.091732978820801,
2313
+ "learning_rate": 5.789312977099237e-05,
2314
+ "loss": 1.6422,
2315
+ "mean_token_accuracy": 0.6811213716864586,
2316
+ "num_tokens": 2735300.0,
2317
+ "step": 2430
2318
+ },
2319
+ {
2320
+ "epoch": 2.168888888888889,
2321
+ "grad_norm": 2.1138076782226562,
2322
+ "learning_rate": 5.7282442748091605e-05,
2323
+ "loss": 1.5848,
2324
+ "mean_token_accuracy": 0.6962573245167732,
2325
+ "num_tokens": 2746248.0,
2326
+ "step": 2440
2327
+ },
2328
+ {
2329
+ "epoch": 2.1777777777777776,
2330
+ "grad_norm": 2.1495392322540283,
2331
+ "learning_rate": 5.667175572519085e-05,
2332
+ "loss": 1.576,
2333
+ "mean_token_accuracy": 0.6990228727459907,
2334
+ "num_tokens": 2757259.0,
2335
+ "step": 2450
2336
+ },
2337
+ {
2338
+ "epoch": 2.1866666666666665,
2339
+ "grad_norm": 2.1444251537323,
2340
+ "learning_rate": 5.606106870229008e-05,
2341
+ "loss": 1.5979,
2342
+ "mean_token_accuracy": 0.6916472837328911,
2343
+ "num_tokens": 2768228.0,
2344
+ "step": 2460
2345
+ },
2346
+ {
2347
+ "epoch": 2.1955555555555555,
2348
+ "grad_norm": 1.945489525794983,
2349
+ "learning_rate": 5.545038167938932e-05,
2350
+ "loss": 1.5663,
2351
+ "mean_token_accuracy": 0.7005513325333595,
2352
+ "num_tokens": 2779254.0,
2353
+ "step": 2470
2354
+ },
2355
+ {
2356
+ "epoch": 2.2044444444444444,
2357
+ "grad_norm": 1.8256646394729614,
2358
+ "learning_rate": 5.483969465648855e-05,
2359
+ "loss": 1.5751,
2360
+ "mean_token_accuracy": 0.6961624413728714,
2361
+ "num_tokens": 2790326.0,
2362
+ "step": 2480
2363
+ },
2364
+ {
2365
+ "epoch": 2.2133333333333334,
2366
+ "grad_norm": 1.9541441202163696,
2367
+ "learning_rate": 5.422900763358779e-05,
2368
+ "loss": 1.6268,
2369
+ "mean_token_accuracy": 0.6893054991960526,
2370
+ "num_tokens": 2801625.0,
2371
+ "step": 2490
2372
+ },
2373
+ {
2374
+ "epoch": 2.2222222222222223,
2375
+ "grad_norm": 2.0127615928649902,
2376
+ "learning_rate": 5.361832061068702e-05,
2377
+ "loss": 1.6096,
2378
+ "mean_token_accuracy": 0.6923437744379044,
2379
+ "num_tokens": 2813010.0,
2380
+ "step": 2500
2381
+ },
2382
+ {
2383
+ "epoch": 2.2311111111111113,
2384
+ "grad_norm": 2.0325839519500732,
2385
+ "learning_rate": 5.300763358778626e-05,
2386
+ "loss": 1.5963,
2387
+ "mean_token_accuracy": 0.6913090571761131,
2388
+ "num_tokens": 2824021.0,
2389
+ "step": 2510
2390
+ },
2391
+ {
2392
+ "epoch": 2.24,
2393
+ "grad_norm": 2.1595821380615234,
2394
+ "learning_rate": 5.23969465648855e-05,
2395
+ "loss": 1.5617,
2396
+ "mean_token_accuracy": 0.7037980020046234,
2397
+ "num_tokens": 2835232.0,
2398
+ "step": 2520
2399
+ },
2400
+ {
2401
+ "epoch": 2.2488888888888887,
2402
+ "grad_norm": 2.11661958694458,
2403
+ "learning_rate": 5.178625954198474e-05,
2404
+ "loss": 1.6213,
2405
+ "mean_token_accuracy": 0.6836483731865883,
2406
+ "num_tokens": 2846524.0,
2407
+ "step": 2530
2408
+ },
2409
+ {
2410
+ "epoch": 2.2577777777777777,
2411
+ "grad_norm": 1.88747239112854,
2412
+ "learning_rate": 5.117557251908397e-05,
2413
+ "loss": 1.6408,
2414
+ "mean_token_accuracy": 0.6860729962587356,
2415
+ "num_tokens": 2857788.0,
2416
+ "step": 2540
2417
+ },
2418
+ {
2419
+ "epoch": 2.2666666666666666,
2420
+ "grad_norm": 1.9622093439102173,
2421
+ "learning_rate": 5.056488549618321e-05,
2422
+ "loss": 1.5519,
2423
+ "mean_token_accuracy": 0.7002682030200958,
2424
+ "num_tokens": 2868618.0,
2425
+ "step": 2550
2426
+ },
2427
+ {
2428
+ "epoch": 2.2755555555555556,
2429
+ "grad_norm": 1.9343371391296387,
2430
+ "learning_rate": 4.995419847328244e-05,
2431
+ "loss": 1.5795,
2432
+ "mean_token_accuracy": 0.6934511423110962,
2433
+ "num_tokens": 2879999.0,
2434
+ "step": 2560
2435
+ },
2436
+ {
2437
+ "epoch": 2.2844444444444445,
2438
+ "grad_norm": 1.9991627931594849,
2439
+ "learning_rate": 4.934351145038168e-05,
2440
+ "loss": 1.6183,
2441
+ "mean_token_accuracy": 0.6901679039001465,
2442
+ "num_tokens": 2891053.0,
2443
+ "step": 2570
2444
+ },
2445
+ {
2446
+ "epoch": 2.2933333333333334,
2447
+ "grad_norm": 1.9480003118515015,
2448
+ "learning_rate": 4.8732824427480914e-05,
2449
+ "loss": 1.5826,
2450
+ "mean_token_accuracy": 0.7007558569312096,
2451
+ "num_tokens": 2901905.0,
2452
+ "step": 2580
2453
+ },
2454
+ {
2455
+ "epoch": 2.3022222222222224,
2456
+ "grad_norm": 2.021207332611084,
2457
+ "learning_rate": 4.812213740458015e-05,
2458
+ "loss": 1.6348,
2459
+ "mean_token_accuracy": 0.6848765298724174,
2460
+ "num_tokens": 2913571.0,
2461
+ "step": 2590
2462
+ },
2463
+ {
2464
+ "epoch": 2.311111111111111,
2465
+ "grad_norm": 1.8385164737701416,
2466
+ "learning_rate": 4.751145038167939e-05,
2467
+ "loss": 1.5763,
2468
+ "mean_token_accuracy": 0.6912240386009216,
2469
+ "num_tokens": 2925533.0,
2470
+ "step": 2600
2471
+ },
2472
+ {
2473
+ "epoch": 2.311111111111111,
2474
+ "eval_loss": 1.8940143585205078,
2475
+ "eval_mean_token_accuracy": 0.6499911918640137,
2476
+ "eval_num_tokens": 2925533.0,
2477
+ "eval_runtime": 47.456,
2478
+ "eval_samples_per_second": 21.072,
2479
+ "eval_steps_per_second": 10.536,
2480
+ "step": 2600
2481
+ },
2482
+ {
2483
+ "epoch": 2.32,
2484
+ "grad_norm": 1.9455375671386719,
2485
+ "learning_rate": 4.690076335877863e-05,
2486
+ "loss": 1.598,
2487
+ "mean_token_accuracy": 0.6915700435638428,
2488
+ "num_tokens": 2936620.0,
2489
+ "step": 2610
2490
+ },
2491
+ {
2492
+ "epoch": 2.328888888888889,
2493
+ "grad_norm": 1.863487720489502,
2494
+ "learning_rate": 4.6290076335877864e-05,
2495
+ "loss": 1.5512,
2496
+ "mean_token_accuracy": 0.7025073647499085,
2497
+ "num_tokens": 2947753.0,
2498
+ "step": 2620
2499
+ },
2500
+ {
2501
+ "epoch": 2.3377777777777777,
2502
+ "grad_norm": 1.9756685495376587,
2503
+ "learning_rate": 4.56793893129771e-05,
2504
+ "loss": 1.5973,
2505
+ "mean_token_accuracy": 0.6870647758245468,
2506
+ "num_tokens": 2959635.0,
2507
+ "step": 2630
2508
+ },
2509
+ {
2510
+ "epoch": 2.3466666666666667,
2511
+ "grad_norm": 2.190765142440796,
2512
+ "learning_rate": 4.5068702290076336e-05,
2513
+ "loss": 1.5948,
2514
+ "mean_token_accuracy": 0.6888303905725479,
2515
+ "num_tokens": 2971675.0,
2516
+ "step": 2640
2517
+ },
2518
+ {
2519
+ "epoch": 2.3555555555555556,
2520
+ "grad_norm": 1.827318787574768,
2521
+ "learning_rate": 4.445801526717557e-05,
2522
+ "loss": 1.5682,
2523
+ "mean_token_accuracy": 0.6952902913093567,
2524
+ "num_tokens": 2982744.0,
2525
+ "step": 2650
2526
+ },
2527
+ {
2528
+ "epoch": 2.3644444444444446,
2529
+ "grad_norm": 2.11799693107605,
2530
+ "learning_rate": 4.384732824427481e-05,
2531
+ "loss": 1.6221,
2532
+ "mean_token_accuracy": 0.6794109031558037,
2533
+ "num_tokens": 2994347.0,
2534
+ "step": 2660
2535
+ },
2536
+ {
2537
+ "epoch": 2.3733333333333335,
2538
+ "grad_norm": 2.1472220420837402,
2539
+ "learning_rate": 4.3236641221374044e-05,
2540
+ "loss": 1.6353,
2541
+ "mean_token_accuracy": 0.6876759916543961,
2542
+ "num_tokens": 3005174.0,
2543
+ "step": 2670
2544
+ },
2545
+ {
2546
+ "epoch": 2.3822222222222225,
2547
+ "grad_norm": 1.9971054792404175,
2548
+ "learning_rate": 4.2625954198473286e-05,
2549
+ "loss": 1.5372,
2550
+ "mean_token_accuracy": 0.7059834420680999,
2551
+ "num_tokens": 3016492.0,
2552
+ "step": 2680
2553
+ },
2554
+ {
2555
+ "epoch": 2.391111111111111,
2556
+ "grad_norm": 2.067861318588257,
2557
+ "learning_rate": 4.201526717557252e-05,
2558
+ "loss": 1.572,
2559
+ "mean_token_accuracy": 0.6911077201366425,
2560
+ "num_tokens": 3027826.0,
2561
+ "step": 2690
2562
+ },
2563
+ {
2564
+ "epoch": 2.4,
2565
+ "grad_norm": 2.0372536182403564,
2566
+ "learning_rate": 4.140458015267176e-05,
2567
+ "loss": 1.5615,
2568
+ "mean_token_accuracy": 0.6972797185182571,
2569
+ "num_tokens": 3038770.0,
2570
+ "step": 2700
2571
+ },
2572
+ {
2573
+ "epoch": 2.408888888888889,
2574
+ "grad_norm": 2.15972638130188,
2575
+ "learning_rate": 4.0793893129770994e-05,
2576
+ "loss": 1.5806,
2577
+ "mean_token_accuracy": 0.6947444006800652,
2578
+ "num_tokens": 3050159.0,
2579
+ "step": 2710
2580
+ },
2581
+ {
2582
+ "epoch": 2.417777777777778,
2583
+ "grad_norm": 2.059760808944702,
2584
+ "learning_rate": 4.018320610687023e-05,
2585
+ "loss": 1.6167,
2586
+ "mean_token_accuracy": 0.6882677704095841,
2587
+ "num_tokens": 3061009.0,
2588
+ "step": 2720
2589
+ },
2590
+ {
2591
+ "epoch": 2.4266666666666667,
2592
+ "grad_norm": 1.9914629459381104,
2593
+ "learning_rate": 3.9572519083969466e-05,
2594
+ "loss": 1.5508,
2595
+ "mean_token_accuracy": 0.6985371947288513,
2596
+ "num_tokens": 3072232.0,
2597
+ "step": 2730
2598
+ },
2599
+ {
2600
+ "epoch": 2.4355555555555557,
2601
+ "grad_norm": 2.0151119232177734,
2602
+ "learning_rate": 3.89618320610687e-05,
2603
+ "loss": 1.663,
2604
+ "mean_token_accuracy": 0.6849021047353745,
2605
+ "num_tokens": 3083939.0,
2606
+ "step": 2740
2607
+ },
2608
+ {
2609
+ "epoch": 2.4444444444444446,
2610
+ "grad_norm": 2.02457332611084,
2611
+ "learning_rate": 3.835114503816794e-05,
2612
+ "loss": 1.6043,
2613
+ "mean_token_accuracy": 0.6891427770256996,
2614
+ "num_tokens": 3095354.0,
2615
+ "step": 2750
2616
+ },
2617
+ {
2618
+ "epoch": 2.453333333333333,
2619
+ "grad_norm": 1.930341362953186,
2620
+ "learning_rate": 3.774045801526718e-05,
2621
+ "loss": 1.5648,
2622
+ "mean_token_accuracy": 0.6962095096707344,
2623
+ "num_tokens": 3106679.0,
2624
+ "step": 2760
2625
+ },
2626
+ {
2627
+ "epoch": 2.462222222222222,
2628
+ "grad_norm": 2.1718850135803223,
2629
+ "learning_rate": 3.7129770992366416e-05,
2630
+ "loss": 1.5514,
2631
+ "mean_token_accuracy": 0.6997211873531342,
2632
+ "num_tokens": 3117440.0,
2633
+ "step": 2770
2634
+ },
2635
+ {
2636
+ "epoch": 2.471111111111111,
2637
+ "grad_norm": 1.89506196975708,
2638
+ "learning_rate": 3.651908396946565e-05,
2639
+ "loss": 1.6102,
2640
+ "mean_token_accuracy": 0.6865462198853493,
2641
+ "num_tokens": 3128685.0,
2642
+ "step": 2780
2643
+ },
2644
+ {
2645
+ "epoch": 2.48,
2646
+ "grad_norm": 2.1102652549743652,
2647
+ "learning_rate": 3.590839694656489e-05,
2648
+ "loss": 1.6092,
2649
+ "mean_token_accuracy": 0.6845578849315643,
2650
+ "num_tokens": 3140574.0,
2651
+ "step": 2790
2652
+ },
2653
+ {
2654
+ "epoch": 2.488888888888889,
2655
+ "grad_norm": 1.9541523456573486,
2656
+ "learning_rate": 3.5297709923664124e-05,
2657
+ "loss": 1.6245,
2658
+ "mean_token_accuracy": 0.6867643877863884,
2659
+ "num_tokens": 3151937.0,
2660
+ "step": 2800
2661
+ },
2662
+ {
2663
+ "epoch": 2.488888888888889,
2664
+ "eval_loss": 1.8869248628616333,
2665
+ "eval_mean_token_accuracy": 0.6508636207580566,
2666
+ "eval_num_tokens": 3151937.0,
2667
+ "eval_runtime": 46.9872,
2668
+ "eval_samples_per_second": 21.282,
2669
+ "eval_steps_per_second": 10.641,
2670
+ "step": 2800
2671
+ },
2672
+ {
2673
+ "epoch": 2.497777777777778,
2674
+ "grad_norm": 2.006448984146118,
2675
+ "learning_rate": 3.468702290076336e-05,
2676
+ "loss": 1.6458,
2677
+ "mean_token_accuracy": 0.6835160732269288,
2678
+ "num_tokens": 3163343.0,
2679
+ "step": 2810
2680
+ },
2681
+ {
2682
+ "epoch": 2.506666666666667,
2683
+ "grad_norm": 2.0644562244415283,
2684
+ "learning_rate": 3.4076335877862595e-05,
2685
+ "loss": 1.5841,
2686
+ "mean_token_accuracy": 0.699130979180336,
2687
+ "num_tokens": 3174278.0,
2688
+ "step": 2820
2689
+ },
2690
+ {
2691
+ "epoch": 2.5155555555555553,
2692
+ "grad_norm": 2.5352766513824463,
2693
+ "learning_rate": 3.346564885496183e-05,
2694
+ "loss": 1.6411,
2695
+ "mean_token_accuracy": 0.687686163187027,
2696
+ "num_tokens": 3185529.0,
2697
+ "step": 2830
2698
+ },
2699
+ {
2700
+ "epoch": 2.5244444444444447,
2701
+ "grad_norm": 2.2506706714630127,
2702
+ "learning_rate": 3.2854961832061074e-05,
2703
+ "loss": 1.5334,
2704
+ "mean_token_accuracy": 0.7042266175150871,
2705
+ "num_tokens": 3196422.0,
2706
+ "step": 2840
2707
+ },
2708
+ {
2709
+ "epoch": 2.533333333333333,
2710
+ "grad_norm": 2.038456439971924,
2711
+ "learning_rate": 3.224427480916031e-05,
2712
+ "loss": 1.5226,
2713
+ "mean_token_accuracy": 0.7002356797456741,
2714
+ "num_tokens": 3207640.0,
2715
+ "step": 2850
2716
+ },
2717
+ {
2718
+ "epoch": 2.542222222222222,
2719
+ "grad_norm": 2.0818448066711426,
2720
+ "learning_rate": 3.1633587786259545e-05,
2721
+ "loss": 1.5136,
2722
+ "mean_token_accuracy": 0.7040936380624772,
2723
+ "num_tokens": 3218742.0,
2724
+ "step": 2860
2725
+ },
2726
+ {
2727
+ "epoch": 2.551111111111111,
2728
+ "grad_norm": 1.9810820817947388,
2729
+ "learning_rate": 3.102290076335878e-05,
2730
+ "loss": 1.6515,
2731
+ "mean_token_accuracy": 0.6826088905334473,
2732
+ "num_tokens": 3230062.0,
2733
+ "step": 2870
2734
+ },
2735
+ {
2736
+ "epoch": 2.56,
2737
+ "grad_norm": 2.1830689907073975,
2738
+ "learning_rate": 3.0412213740458017e-05,
2739
+ "loss": 1.5792,
2740
+ "mean_token_accuracy": 0.699496129155159,
2741
+ "num_tokens": 3240533.0,
2742
+ "step": 2880
2743
+ },
2744
+ {
2745
+ "epoch": 2.568888888888889,
2746
+ "grad_norm": 2.101184368133545,
2747
+ "learning_rate": 2.9801526717557253e-05,
2748
+ "loss": 1.6538,
2749
+ "mean_token_accuracy": 0.6724523141980171,
2750
+ "num_tokens": 3252476.0,
2751
+ "step": 2890
2752
+ },
2753
+ {
2754
+ "epoch": 2.5777777777777775,
2755
+ "grad_norm": 2.021524429321289,
2756
+ "learning_rate": 2.9190839694656492e-05,
2757
+ "loss": 1.6146,
2758
+ "mean_token_accuracy": 0.6886414483189582,
2759
+ "num_tokens": 3263799.0,
2760
+ "step": 2900
2761
+ },
2762
+ {
2763
+ "epoch": 2.586666666666667,
2764
+ "grad_norm": 1.9668735265731812,
2765
+ "learning_rate": 2.8580152671755728e-05,
2766
+ "loss": 1.6477,
2767
+ "mean_token_accuracy": 0.678925508260727,
2768
+ "num_tokens": 3275511.0,
2769
+ "step": 2910
2770
+ },
2771
+ {
2772
+ "epoch": 2.5955555555555554,
2773
+ "grad_norm": 2.088491201400757,
2774
+ "learning_rate": 2.7969465648854964e-05,
2775
+ "loss": 1.6265,
2776
+ "mean_token_accuracy": 0.6857595339417457,
2777
+ "num_tokens": 3286752.0,
2778
+ "step": 2920
2779
+ },
2780
+ {
2781
+ "epoch": 2.6044444444444443,
2782
+ "grad_norm": 2.0536880493164062,
2783
+ "learning_rate": 2.73587786259542e-05,
2784
+ "loss": 1.66,
2785
+ "mean_token_accuracy": 0.681273227930069,
2786
+ "num_tokens": 3297945.0,
2787
+ "step": 2930
2788
+ },
2789
+ {
2790
+ "epoch": 2.6133333333333333,
2791
+ "grad_norm": 2.0063817501068115,
2792
+ "learning_rate": 2.674809160305344e-05,
2793
+ "loss": 1.5102,
2794
+ "mean_token_accuracy": 0.7025244757533073,
2795
+ "num_tokens": 3309112.0,
2796
+ "step": 2940
2797
+ },
2798
+ {
2799
+ "epoch": 2.6222222222222222,
2800
+ "grad_norm": 1.9980206489562988,
2801
+ "learning_rate": 2.6137404580152675e-05,
2802
+ "loss": 1.5142,
2803
+ "mean_token_accuracy": 0.7049572348594666,
2804
+ "num_tokens": 3320544.0,
2805
+ "step": 2950
2806
+ },
2807
+ {
2808
+ "epoch": 2.631111111111111,
2809
+ "grad_norm": 2.1506435871124268,
2810
+ "learning_rate": 2.552671755725191e-05,
2811
+ "loss": 1.5826,
2812
+ "mean_token_accuracy": 0.694467018544674,
2813
+ "num_tokens": 3331309.0,
2814
+ "step": 2960
2815
+ },
2816
+ {
2817
+ "epoch": 2.64,
2818
+ "grad_norm": 1.9890793561935425,
2819
+ "learning_rate": 2.4916030534351147e-05,
2820
+ "loss": 1.5631,
2821
+ "mean_token_accuracy": 0.6945617944002151,
2822
+ "num_tokens": 3343068.0,
2823
+ "step": 2970
2824
+ },
2825
+ {
2826
+ "epoch": 2.648888888888889,
2827
+ "grad_norm": 2.1102676391601562,
2828
+ "learning_rate": 2.4305343511450383e-05,
2829
+ "loss": 1.6145,
2830
+ "mean_token_accuracy": 0.6866093754768372,
2831
+ "num_tokens": 3354691.0,
2832
+ "step": 2980
2833
+ },
2834
+ {
2835
+ "epoch": 2.6577777777777776,
2836
+ "grad_norm": 2.2881674766540527,
2837
+ "learning_rate": 2.369465648854962e-05,
2838
+ "loss": 1.5796,
2839
+ "mean_token_accuracy": 0.6961612686514854,
2840
+ "num_tokens": 3365512.0,
2841
+ "step": 2990
2842
+ },
2843
+ {
2844
+ "epoch": 2.6666666666666665,
2845
+ "grad_norm": 1.973838210105896,
2846
+ "learning_rate": 2.3083969465648854e-05,
2847
+ "loss": 1.5456,
2848
+ "mean_token_accuracy": 0.703473174571991,
2849
+ "num_tokens": 3376406.0,
2850
+ "step": 3000
2851
+ },
2852
+ {
2853
+ "epoch": 2.6666666666666665,
2854
+ "eval_loss": 1.881131649017334,
2855
+ "eval_mean_token_accuracy": 0.6518214672803879,
2856
+ "eval_num_tokens": 3376406.0,
2857
+ "eval_runtime": 47.794,
2858
+ "eval_samples_per_second": 20.923,
2859
+ "eval_steps_per_second": 10.462,
2860
+ "step": 3000
2861
+ }
2862
+ ],
2863
+ "logging_steps": 10,
2864
+ "max_steps": 3375,
2865
+ "num_input_tokens_seen": 0,
2866
+ "num_train_epochs": 3,
2867
+ "save_steps": 200,
2868
+ "stateful_callbacks": {
2869
+ "TrainerControl": {
2870
+ "args": {
2871
+ "should_epoch_stop": false,
2872
+ "should_evaluate": false,
2873
+ "should_log": false,
2874
+ "should_save": true,
2875
+ "should_training_stop": false
2876
+ },
2877
+ "attributes": {}
2878
+ }
2879
+ },
2880
+ "total_flos": 1.0707043350011904e+16,
2881
+ "train_batch_size": 2,
2882
+ "trial_name": null,
2883
+ "trial_params": null
2884
+ }
checkpoint-3000/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3200/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-3200/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "q_proj",
28
+ "v_proj",
29
+ "k_proj",
30
+ "up_proj",
31
+ "gate_proj",
32
+ "o_proj",
33
+ "down_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
checkpoint-3200/added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
checkpoint-3200/chat_template.jinja ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
2
+ ' + message['content'] + '<|im_end|>' + '
3
+ '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
4
+ ' }}{% endif %}
checkpoint-3200/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3200/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7312eb539871a9d26e9f27442e07b9ee29e9f0cb17f6fdbbd03c79475217c218
3
+ size 14244
checkpoint-3200/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94590cbe77c48314bba8fc5ea19e62e4c2fd31ca42a1022d1337608a88ad9d8e
3
+ size 988
checkpoint-3200/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c008f624a769f2c06373201d98b59ce88f962ffd48ffa1d7cbe66cd59b13d4d
3
+ size 1064
checkpoint-3200/special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
checkpoint-3200/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3200/tokenizer_config.json ADDED
@@ -0,0 +1,501 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "clean_up_tokenization_spaces": true,
495
+ "eos_token": "<|endofturn|>",
496
+ "extra_special_tokens": {},
497
+ "model_max_length": 1000000000000000019884624838656,
498
+ "pad_token": "<|endoftext|>",
499
+ "tokenizer_class": "GPT2Tokenizer",
500
+ "unk_token": "<|endoftext|>"
501
+ }
checkpoint-3200/trainer_state.json ADDED
@@ -0,0 +1,3074 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 3200,
3
+ "best_metric": 1.8764336109161377,
4
+ "best_model_checkpoint": "/content/drive/MyDrive/hyperclova-deobfuscation-lora/checkpoint-3200",
5
+ "epoch": 2.8444444444444446,
6
+ "eval_steps": 200,
7
+ "global_step": 3200,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.008888888888888889,
14
+ "grad_norm": 3.629798412322998,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 4.1483,
17
+ "mean_token_accuracy": 0.34797456339001653,
18
+ "num_tokens": 11242.0,
19
+ "step": 10
20
+ },
21
+ {
22
+ "epoch": 0.017777777777777778,
23
+ "grad_norm": 2.6125221252441406,
24
+ "learning_rate": 3.8e-05,
25
+ "loss": 3.7515,
26
+ "mean_token_accuracy": 0.4058148756623268,
27
+ "num_tokens": 22106.0,
28
+ "step": 20
29
+ },
30
+ {
31
+ "epoch": 0.02666666666666667,
32
+ "grad_norm": 2.9313137531280518,
33
+ "learning_rate": 5.8e-05,
34
+ "loss": 3.3279,
35
+ "mean_token_accuracy": 0.4703808955848217,
36
+ "num_tokens": 33774.0,
37
+ "step": 30
38
+ },
39
+ {
40
+ "epoch": 0.035555555555555556,
41
+ "grad_norm": 2.0496416091918945,
42
+ "learning_rate": 7.800000000000001e-05,
43
+ "loss": 2.9114,
44
+ "mean_token_accuracy": 0.5239812344312668,
45
+ "num_tokens": 44943.0,
46
+ "step": 40
47
+ },
48
+ {
49
+ "epoch": 0.044444444444444446,
50
+ "grad_norm": 2.282668352127075,
51
+ "learning_rate": 9.8e-05,
52
+ "loss": 2.8468,
53
+ "mean_token_accuracy": 0.534189497679472,
54
+ "num_tokens": 56341.0,
55
+ "step": 50
56
+ },
57
+ {
58
+ "epoch": 0.05333333333333334,
59
+ "grad_norm": 2.168651819229126,
60
+ "learning_rate": 0.000118,
61
+ "loss": 2.7785,
62
+ "mean_token_accuracy": 0.5407359585165977,
63
+ "num_tokens": 67397.0,
64
+ "step": 60
65
+ },
66
+ {
67
+ "epoch": 0.06222222222222222,
68
+ "grad_norm": 2.289881467819214,
69
+ "learning_rate": 0.000138,
70
+ "loss": 2.736,
71
+ "mean_token_accuracy": 0.5326176360249519,
72
+ "num_tokens": 78482.0,
73
+ "step": 70
74
+ },
75
+ {
76
+ "epoch": 0.07111111111111111,
77
+ "grad_norm": 2.1038105487823486,
78
+ "learning_rate": 0.00015800000000000002,
79
+ "loss": 2.5855,
80
+ "mean_token_accuracy": 0.5618595249950886,
81
+ "num_tokens": 89803.0,
82
+ "step": 80
83
+ },
84
+ {
85
+ "epoch": 0.08,
86
+ "grad_norm": 2.24312686920166,
87
+ "learning_rate": 0.00017800000000000002,
88
+ "loss": 2.5365,
89
+ "mean_token_accuracy": 0.5661972932517528,
90
+ "num_tokens": 101015.0,
91
+ "step": 90
92
+ },
93
+ {
94
+ "epoch": 0.08888888888888889,
95
+ "grad_norm": 1.9482938051223755,
96
+ "learning_rate": 0.00019800000000000002,
97
+ "loss": 2.5634,
98
+ "mean_token_accuracy": 0.5538406319916248,
99
+ "num_tokens": 112364.0,
100
+ "step": 100
101
+ },
102
+ {
103
+ "epoch": 0.09777777777777778,
104
+ "grad_norm": 1.86210298538208,
105
+ "learning_rate": 0.00019945038167938932,
106
+ "loss": 2.4629,
107
+ "mean_token_accuracy": 0.5780388668179512,
108
+ "num_tokens": 122882.0,
109
+ "step": 110
110
+ },
111
+ {
112
+ "epoch": 0.10666666666666667,
113
+ "grad_norm": 1.8806918859481812,
114
+ "learning_rate": 0.00019883969465648855,
115
+ "loss": 2.5022,
116
+ "mean_token_accuracy": 0.563551553338766,
117
+ "num_tokens": 134028.0,
118
+ "step": 120
119
+ },
120
+ {
121
+ "epoch": 0.11555555555555555,
122
+ "grad_norm": 2.3264434337615967,
123
+ "learning_rate": 0.00019829007633587786,
124
+ "loss": 2.4065,
125
+ "mean_token_accuracy": 0.5807355619966984,
126
+ "num_tokens": 145192.0,
127
+ "step": 130
128
+ },
129
+ {
130
+ "epoch": 0.12444444444444444,
131
+ "grad_norm": 1.8537976741790771,
132
+ "learning_rate": 0.00019767938931297712,
133
+ "loss": 2.4838,
134
+ "mean_token_accuracy": 0.566282794624567,
135
+ "num_tokens": 156703.0,
136
+ "step": 140
137
+ },
138
+ {
139
+ "epoch": 0.13333333333333333,
140
+ "grad_norm": 2.0960652828216553,
141
+ "learning_rate": 0.00019706870229007636,
142
+ "loss": 2.4119,
143
+ "mean_token_accuracy": 0.5830203481018543,
144
+ "num_tokens": 168041.0,
145
+ "step": 150
146
+ },
147
+ {
148
+ "epoch": 0.14222222222222222,
149
+ "grad_norm": 2.2244813442230225,
150
+ "learning_rate": 0.00019645801526717557,
151
+ "loss": 2.3726,
152
+ "mean_token_accuracy": 0.5844443172216416,
153
+ "num_tokens": 178986.0,
154
+ "step": 160
155
+ },
156
+ {
157
+ "epoch": 0.1511111111111111,
158
+ "grad_norm": 1.8238722085952759,
159
+ "learning_rate": 0.0001958473282442748,
160
+ "loss": 2.4419,
161
+ "mean_token_accuracy": 0.5708602093160152,
162
+ "num_tokens": 190391.0,
163
+ "step": 170
164
+ },
165
+ {
166
+ "epoch": 0.16,
167
+ "grad_norm": 1.7154136896133423,
168
+ "learning_rate": 0.00019523664122137407,
169
+ "loss": 2.4293,
170
+ "mean_token_accuracy": 0.5748118035495281,
171
+ "num_tokens": 201989.0,
172
+ "step": 180
173
+ },
174
+ {
175
+ "epoch": 0.1688888888888889,
176
+ "grad_norm": 1.7582788467407227,
177
+ "learning_rate": 0.0001946259541984733,
178
+ "loss": 2.3577,
179
+ "mean_token_accuracy": 0.5877166777849198,
180
+ "num_tokens": 212914.0,
181
+ "step": 190
182
+ },
183
+ {
184
+ "epoch": 0.17777777777777778,
185
+ "grad_norm": 1.8613263368606567,
186
+ "learning_rate": 0.0001940152671755725,
187
+ "loss": 2.3486,
188
+ "mean_token_accuracy": 0.5889834299683571,
189
+ "num_tokens": 223936.0,
190
+ "step": 200
191
+ },
192
+ {
193
+ "epoch": 0.17777777777777778,
194
+ "eval_loss": 2.3320820331573486,
195
+ "eval_mean_token_accuracy": 0.5868698905706405,
196
+ "eval_num_tokens": 223936.0,
197
+ "eval_runtime": 49.2429,
198
+ "eval_samples_per_second": 20.307,
199
+ "eval_steps_per_second": 10.154,
200
+ "step": 200
201
+ },
202
+ {
203
+ "epoch": 0.18666666666666668,
204
+ "grad_norm": 1.8486477136611938,
205
+ "learning_rate": 0.00019340458015267175,
206
+ "loss": 2.3666,
207
+ "mean_token_accuracy": 0.5847611322999,
208
+ "num_tokens": 235036.0,
209
+ "step": 210
210
+ },
211
+ {
212
+ "epoch": 0.19555555555555557,
213
+ "grad_norm": 2.018049478530884,
214
+ "learning_rate": 0.000192793893129771,
215
+ "loss": 2.2689,
216
+ "mean_token_accuracy": 0.59971177354455,
217
+ "num_tokens": 246101.0,
218
+ "step": 220
219
+ },
220
+ {
221
+ "epoch": 0.20444444444444446,
222
+ "grad_norm": 1.7244890928268433,
223
+ "learning_rate": 0.00019218320610687024,
224
+ "loss": 2.3262,
225
+ "mean_token_accuracy": 0.5855986528098583,
226
+ "num_tokens": 257953.0,
227
+ "step": 230
228
+ },
229
+ {
230
+ "epoch": 0.21333333333333335,
231
+ "grad_norm": 1.8928934335708618,
232
+ "learning_rate": 0.00019157251908396948,
233
+ "loss": 2.3318,
234
+ "mean_token_accuracy": 0.5885626815259457,
235
+ "num_tokens": 269187.0,
236
+ "step": 240
237
+ },
238
+ {
239
+ "epoch": 0.2222222222222222,
240
+ "grad_norm": 1.7358920574188232,
241
+ "learning_rate": 0.0001909618320610687,
242
+ "loss": 2.2145,
243
+ "mean_token_accuracy": 0.6092555984854698,
244
+ "num_tokens": 279762.0,
245
+ "step": 250
246
+ },
247
+ {
248
+ "epoch": 0.2311111111111111,
249
+ "grad_norm": 1.6779032945632935,
250
+ "learning_rate": 0.00019035114503816795,
251
+ "loss": 2.3152,
252
+ "mean_token_accuracy": 0.584602715075016,
253
+ "num_tokens": 291454.0,
254
+ "step": 260
255
+ },
256
+ {
257
+ "epoch": 0.24,
258
+ "grad_norm": 1.6310207843780518,
259
+ "learning_rate": 0.0001897404580152672,
260
+ "loss": 2.2669,
261
+ "mean_token_accuracy": 0.5965895019471645,
262
+ "num_tokens": 302969.0,
263
+ "step": 270
264
+ },
265
+ {
266
+ "epoch": 0.24888888888888888,
267
+ "grad_norm": 1.6765615940093994,
268
+ "learning_rate": 0.00018912977099236642,
269
+ "loss": 2.269,
270
+ "mean_token_accuracy": 0.5934441670775413,
271
+ "num_tokens": 314204.0,
272
+ "step": 280
273
+ },
274
+ {
275
+ "epoch": 0.2577777777777778,
276
+ "grad_norm": 1.793959617614746,
277
+ "learning_rate": 0.00018851908396946566,
278
+ "loss": 2.2554,
279
+ "mean_token_accuracy": 0.600947193801403,
280
+ "num_tokens": 325649.0,
281
+ "step": 290
282
+ },
283
+ {
284
+ "epoch": 0.26666666666666666,
285
+ "grad_norm": 1.7492129802703857,
286
+ "learning_rate": 0.0001879083969465649,
287
+ "loss": 2.2157,
288
+ "mean_token_accuracy": 0.6022505328059197,
289
+ "num_tokens": 337167.0,
290
+ "step": 300
291
+ },
292
+ {
293
+ "epoch": 0.27555555555555555,
294
+ "grad_norm": 1.803576946258545,
295
+ "learning_rate": 0.00018729770992366413,
296
+ "loss": 2.2854,
297
+ "mean_token_accuracy": 0.5923042424023152,
298
+ "num_tokens": 348621.0,
299
+ "step": 310
300
+ },
301
+ {
302
+ "epoch": 0.28444444444444444,
303
+ "grad_norm": 1.9662351608276367,
304
+ "learning_rate": 0.00018668702290076337,
305
+ "loss": 2.2639,
306
+ "mean_token_accuracy": 0.588193366676569,
307
+ "num_tokens": 360272.0,
308
+ "step": 320
309
+ },
310
+ {
311
+ "epoch": 0.29333333333333333,
312
+ "grad_norm": 1.6725891828536987,
313
+ "learning_rate": 0.0001860763358778626,
314
+ "loss": 2.2249,
315
+ "mean_token_accuracy": 0.6054098337888718,
316
+ "num_tokens": 371346.0,
317
+ "step": 330
318
+ },
319
+ {
320
+ "epoch": 0.3022222222222222,
321
+ "grad_norm": 1.68416166305542,
322
+ "learning_rate": 0.00018546564885496184,
323
+ "loss": 2.1678,
324
+ "mean_token_accuracy": 0.6146526508033275,
325
+ "num_tokens": 382779.0,
326
+ "step": 340
327
+ },
328
+ {
329
+ "epoch": 0.3111111111111111,
330
+ "grad_norm": 1.7218507528305054,
331
+ "learning_rate": 0.00018485496183206108,
332
+ "loss": 2.2011,
333
+ "mean_token_accuracy": 0.6104303196072578,
334
+ "num_tokens": 393823.0,
335
+ "step": 350
336
+ },
337
+ {
338
+ "epoch": 0.32,
339
+ "grad_norm": 1.6817256212234497,
340
+ "learning_rate": 0.0001842442748091603,
341
+ "loss": 2.2264,
342
+ "mean_token_accuracy": 0.5987282857298851,
343
+ "num_tokens": 405438.0,
344
+ "step": 360
345
+ },
346
+ {
347
+ "epoch": 0.3288888888888889,
348
+ "grad_norm": 1.7454718351364136,
349
+ "learning_rate": 0.00018363358778625955,
350
+ "loss": 2.2712,
351
+ "mean_token_accuracy": 0.5939777493476868,
352
+ "num_tokens": 417299.0,
353
+ "step": 370
354
+ },
355
+ {
356
+ "epoch": 0.3377777777777778,
357
+ "grad_norm": 2.011315107345581,
358
+ "learning_rate": 0.00018302290076335878,
359
+ "loss": 2.2247,
360
+ "mean_token_accuracy": 0.6061037018895149,
361
+ "num_tokens": 428660.0,
362
+ "step": 380
363
+ },
364
+ {
365
+ "epoch": 0.3466666666666667,
366
+ "grad_norm": 1.6242053508758545,
367
+ "learning_rate": 0.00018241221374045802,
368
+ "loss": 2.232,
369
+ "mean_token_accuracy": 0.6062197655439376,
370
+ "num_tokens": 439768.0,
371
+ "step": 390
372
+ },
373
+ {
374
+ "epoch": 0.35555555555555557,
375
+ "grad_norm": 1.9328559637069702,
376
+ "learning_rate": 0.00018180152671755725,
377
+ "loss": 2.1291,
378
+ "mean_token_accuracy": 0.6168317429721355,
379
+ "num_tokens": 450808.0,
380
+ "step": 400
381
+ },
382
+ {
383
+ "epoch": 0.35555555555555557,
384
+ "eval_loss": 2.1662538051605225,
385
+ "eval_mean_token_accuracy": 0.6099509916305542,
386
+ "eval_num_tokens": 450808.0,
387
+ "eval_runtime": 49.4213,
388
+ "eval_samples_per_second": 20.234,
389
+ "eval_steps_per_second": 10.117,
390
+ "step": 400
391
+ },
392
+ {
393
+ "epoch": 0.36444444444444446,
394
+ "grad_norm": 1.8797143697738647,
395
+ "learning_rate": 0.0001811908396946565,
396
+ "loss": 2.2086,
397
+ "mean_token_accuracy": 0.6012695133686066,
398
+ "num_tokens": 461592.0,
399
+ "step": 410
400
+ },
401
+ {
402
+ "epoch": 0.37333333333333335,
403
+ "grad_norm": 1.7558225393295288,
404
+ "learning_rate": 0.00018058015267175575,
405
+ "loss": 2.1771,
406
+ "mean_token_accuracy": 0.6060668036341668,
407
+ "num_tokens": 473434.0,
408
+ "step": 420
409
+ },
410
+ {
411
+ "epoch": 0.38222222222222224,
412
+ "grad_norm": 1.845051884651184,
413
+ "learning_rate": 0.00017996946564885496,
414
+ "loss": 2.2576,
415
+ "mean_token_accuracy": 0.5929104581475257,
416
+ "num_tokens": 485130.0,
417
+ "step": 430
418
+ },
419
+ {
420
+ "epoch": 0.39111111111111113,
421
+ "grad_norm": 1.6992298364639282,
422
+ "learning_rate": 0.0001793587786259542,
423
+ "loss": 2.1815,
424
+ "mean_token_accuracy": 0.6100690707564353,
425
+ "num_tokens": 496482.0,
426
+ "step": 440
427
+ },
428
+ {
429
+ "epoch": 0.4,
430
+ "grad_norm": 1.7239253520965576,
431
+ "learning_rate": 0.00017874809160305343,
432
+ "loss": 2.2082,
433
+ "mean_token_accuracy": 0.6001435503363609,
434
+ "num_tokens": 508218.0,
435
+ "step": 450
436
+ },
437
+ {
438
+ "epoch": 0.4088888888888889,
439
+ "grad_norm": 1.7856336832046509,
440
+ "learning_rate": 0.0001781374045801527,
441
+ "loss": 2.1593,
442
+ "mean_token_accuracy": 0.6118309393525123,
443
+ "num_tokens": 519379.0,
444
+ "step": 460
445
+ },
446
+ {
447
+ "epoch": 0.4177777777777778,
448
+ "grad_norm": 1.611831545829773,
449
+ "learning_rate": 0.00017752671755725193,
450
+ "loss": 2.1797,
451
+ "mean_token_accuracy": 0.6033190444111824,
452
+ "num_tokens": 530561.0,
453
+ "step": 470
454
+ },
455
+ {
456
+ "epoch": 0.4266666666666667,
457
+ "grad_norm": 1.7420586347579956,
458
+ "learning_rate": 0.00017691603053435114,
459
+ "loss": 2.2027,
460
+ "mean_token_accuracy": 0.6067790001630783,
461
+ "num_tokens": 542631.0,
462
+ "step": 480
463
+ },
464
+ {
465
+ "epoch": 0.43555555555555553,
466
+ "grad_norm": 1.948723316192627,
467
+ "learning_rate": 0.00017630534351145038,
468
+ "loss": 2.1753,
469
+ "mean_token_accuracy": 0.6109650492668152,
470
+ "num_tokens": 553477.0,
471
+ "step": 490
472
+ },
473
+ {
474
+ "epoch": 0.4444444444444444,
475
+ "grad_norm": 1.7983819246292114,
476
+ "learning_rate": 0.00017569465648854964,
477
+ "loss": 2.158,
478
+ "mean_token_accuracy": 0.5996212616562844,
479
+ "num_tokens": 565400.0,
480
+ "step": 500
481
+ },
482
+ {
483
+ "epoch": 0.4533333333333333,
484
+ "grad_norm": 1.842372179031372,
485
+ "learning_rate": 0.00017508396946564888,
486
+ "loss": 2.0825,
487
+ "mean_token_accuracy": 0.6168116196990013,
488
+ "num_tokens": 576953.0,
489
+ "step": 510
490
+ },
491
+ {
492
+ "epoch": 0.4622222222222222,
493
+ "grad_norm": 1.91799795627594,
494
+ "learning_rate": 0.00017447328244274809,
495
+ "loss": 2.1022,
496
+ "mean_token_accuracy": 0.6168905258178711,
497
+ "num_tokens": 588003.0,
498
+ "step": 520
499
+ },
500
+ {
501
+ "epoch": 0.4711111111111111,
502
+ "grad_norm": 1.7727124691009521,
503
+ "learning_rate": 0.00017386259541984732,
504
+ "loss": 2.1695,
505
+ "mean_token_accuracy": 0.5997609972953797,
506
+ "num_tokens": 600043.0,
507
+ "step": 530
508
+ },
509
+ {
510
+ "epoch": 0.48,
511
+ "grad_norm": 1.8602296113967896,
512
+ "learning_rate": 0.00017325190839694658,
513
+ "loss": 2.0849,
514
+ "mean_token_accuracy": 0.6266478568315506,
515
+ "num_tokens": 610974.0,
516
+ "step": 540
517
+ },
518
+ {
519
+ "epoch": 0.4888888888888889,
520
+ "grad_norm": 1.545620083808899,
521
+ "learning_rate": 0.00017264122137404582,
522
+ "loss": 2.1824,
523
+ "mean_token_accuracy": 0.6072694823145867,
524
+ "num_tokens": 622632.0,
525
+ "step": 550
526
+ },
527
+ {
528
+ "epoch": 0.49777777777777776,
529
+ "grad_norm": 1.7485988140106201,
530
+ "learning_rate": 0.00017203053435114506,
531
+ "loss": 2.1374,
532
+ "mean_token_accuracy": 0.6164417043328285,
533
+ "num_tokens": 634093.0,
534
+ "step": 560
535
+ },
536
+ {
537
+ "epoch": 0.5066666666666667,
538
+ "grad_norm": 1.8591196537017822,
539
+ "learning_rate": 0.00017141984732824426,
540
+ "loss": 2.0928,
541
+ "mean_token_accuracy": 0.6241554819047451,
542
+ "num_tokens": 645226.0,
543
+ "step": 570
544
+ },
545
+ {
546
+ "epoch": 0.5155555555555555,
547
+ "grad_norm": 1.8163517713546753,
548
+ "learning_rate": 0.00017080916030534353,
549
+ "loss": 2.0476,
550
+ "mean_token_accuracy": 0.6285594403743744,
551
+ "num_tokens": 656188.0,
552
+ "step": 580
553
+ },
554
+ {
555
+ "epoch": 0.5244444444444445,
556
+ "grad_norm": 1.7729696035385132,
557
+ "learning_rate": 0.00017019847328244276,
558
+ "loss": 2.1036,
559
+ "mean_token_accuracy": 0.6208315283060074,
560
+ "num_tokens": 667642.0,
561
+ "step": 590
562
+ },
563
+ {
564
+ "epoch": 0.5333333333333333,
565
+ "grad_norm": 1.7804032564163208,
566
+ "learning_rate": 0.000169587786259542,
567
+ "loss": 2.1174,
568
+ "mean_token_accuracy": 0.6148250237107277,
569
+ "num_tokens": 678769.0,
570
+ "step": 600
571
+ },
572
+ {
573
+ "epoch": 0.5333333333333333,
574
+ "eval_loss": 2.0850696563720703,
575
+ "eval_mean_token_accuracy": 0.6197466601729393,
576
+ "eval_num_tokens": 678769.0,
577
+ "eval_runtime": 49.7611,
578
+ "eval_samples_per_second": 20.096,
579
+ "eval_steps_per_second": 10.048,
580
+ "step": 600
581
+ },
582
+ {
583
+ "epoch": 0.5422222222222223,
584
+ "grad_norm": 1.8643274307250977,
585
+ "learning_rate": 0.00016897709923664124,
586
+ "loss": 2.0485,
587
+ "mean_token_accuracy": 0.6331146821379662,
588
+ "num_tokens": 690014.0,
589
+ "step": 610
590
+ },
591
+ {
592
+ "epoch": 0.5511111111111111,
593
+ "grad_norm": 1.8060939311981201,
594
+ "learning_rate": 0.00016836641221374047,
595
+ "loss": 2.1117,
596
+ "mean_token_accuracy": 0.612041813135147,
597
+ "num_tokens": 701734.0,
598
+ "step": 620
599
+ },
600
+ {
601
+ "epoch": 0.56,
602
+ "grad_norm": 1.7059085369110107,
603
+ "learning_rate": 0.0001677557251908397,
604
+ "loss": 2.0747,
605
+ "mean_token_accuracy": 0.6174572542309761,
606
+ "num_tokens": 713570.0,
607
+ "step": 630
608
+ },
609
+ {
610
+ "epoch": 0.5688888888888889,
611
+ "grad_norm": 1.6600592136383057,
612
+ "learning_rate": 0.00016714503816793894,
613
+ "loss": 2.0685,
614
+ "mean_token_accuracy": 0.6293445661664009,
615
+ "num_tokens": 724815.0,
616
+ "step": 640
617
+ },
618
+ {
619
+ "epoch": 0.5777777777777777,
620
+ "grad_norm": 1.6598913669586182,
621
+ "learning_rate": 0.00016653435114503818,
622
+ "loss": 2.0255,
623
+ "mean_token_accuracy": 0.6309839904308319,
624
+ "num_tokens": 735777.0,
625
+ "step": 650
626
+ },
627
+ {
628
+ "epoch": 0.5866666666666667,
629
+ "grad_norm": 1.8306963443756104,
630
+ "learning_rate": 0.00016592366412213741,
631
+ "loss": 2.1249,
632
+ "mean_token_accuracy": 0.6147443532943726,
633
+ "num_tokens": 746903.0,
634
+ "step": 660
635
+ },
636
+ {
637
+ "epoch": 0.5955555555555555,
638
+ "grad_norm": 1.626795768737793,
639
+ "learning_rate": 0.00016531297709923665,
640
+ "loss": 2.0694,
641
+ "mean_token_accuracy": 0.6254988595843315,
642
+ "num_tokens": 757881.0,
643
+ "step": 670
644
+ },
645
+ {
646
+ "epoch": 0.6044444444444445,
647
+ "grad_norm": 1.710806131362915,
648
+ "learning_rate": 0.00016470229007633589,
649
+ "loss": 2.0397,
650
+ "mean_token_accuracy": 0.6233279958367348,
651
+ "num_tokens": 768982.0,
652
+ "step": 680
653
+ },
654
+ {
655
+ "epoch": 0.6133333333333333,
656
+ "grad_norm": 1.7051280736923218,
657
+ "learning_rate": 0.00016409160305343512,
658
+ "loss": 2.116,
659
+ "mean_token_accuracy": 0.6183760315179825,
660
+ "num_tokens": 780072.0,
661
+ "step": 690
662
+ },
663
+ {
664
+ "epoch": 0.6222222222222222,
665
+ "grad_norm": 1.607917070388794,
666
+ "learning_rate": 0.00016348091603053436,
667
+ "loss": 2.0478,
668
+ "mean_token_accuracy": 0.6331974640488625,
669
+ "num_tokens": 791061.0,
670
+ "step": 700
671
+ },
672
+ {
673
+ "epoch": 0.6311111111111111,
674
+ "grad_norm": 1.7803592681884766,
675
+ "learning_rate": 0.0001628702290076336,
676
+ "loss": 2.0595,
677
+ "mean_token_accuracy": 0.6249041527509689,
678
+ "num_tokens": 801867.0,
679
+ "step": 710
680
+ },
681
+ {
682
+ "epoch": 0.64,
683
+ "grad_norm": 1.6132373809814453,
684
+ "learning_rate": 0.00016225954198473283,
685
+ "loss": 2.0789,
686
+ "mean_token_accuracy": 0.6235784366726875,
687
+ "num_tokens": 813112.0,
688
+ "step": 720
689
+ },
690
+ {
691
+ "epoch": 0.6488888888888888,
692
+ "grad_norm": 1.790528655052185,
693
+ "learning_rate": 0.00016164885496183207,
694
+ "loss": 2.0632,
695
+ "mean_token_accuracy": 0.6268924325704575,
696
+ "num_tokens": 824133.0,
697
+ "step": 730
698
+ },
699
+ {
700
+ "epoch": 0.6577777777777778,
701
+ "grad_norm": 2.0007362365722656,
702
+ "learning_rate": 0.0001610381679389313,
703
+ "loss": 2.0701,
704
+ "mean_token_accuracy": 0.6189413338899612,
705
+ "num_tokens": 835469.0,
706
+ "step": 740
707
+ },
708
+ {
709
+ "epoch": 0.6666666666666666,
710
+ "grad_norm": 2.227158546447754,
711
+ "learning_rate": 0.00016042748091603054,
712
+ "loss": 2.0339,
713
+ "mean_token_accuracy": 0.621903920173645,
714
+ "num_tokens": 846572.0,
715
+ "step": 750
716
+ },
717
+ {
718
+ "epoch": 0.6755555555555556,
719
+ "grad_norm": 1.80472731590271,
720
+ "learning_rate": 0.00015981679389312977,
721
+ "loss": 2.1285,
722
+ "mean_token_accuracy": 0.604806374013424,
723
+ "num_tokens": 857795.0,
724
+ "step": 760
725
+ },
726
+ {
727
+ "epoch": 0.6844444444444444,
728
+ "grad_norm": 1.7893937826156616,
729
+ "learning_rate": 0.000159206106870229,
730
+ "loss": 2.0347,
731
+ "mean_token_accuracy": 0.6292635962367058,
732
+ "num_tokens": 868429.0,
733
+ "step": 770
734
+ },
735
+ {
736
+ "epoch": 0.6933333333333334,
737
+ "grad_norm": 1.6761573553085327,
738
+ "learning_rate": 0.00015859541984732824,
739
+ "loss": 2.0591,
740
+ "mean_token_accuracy": 0.6254431992769242,
741
+ "num_tokens": 879659.0,
742
+ "step": 780
743
+ },
744
+ {
745
+ "epoch": 0.7022222222222222,
746
+ "grad_norm": 1.803045630455017,
747
+ "learning_rate": 0.0001579847328244275,
748
+ "loss": 2.0293,
749
+ "mean_token_accuracy": 0.6273573949933052,
750
+ "num_tokens": 890911.0,
751
+ "step": 790
752
+ },
753
+ {
754
+ "epoch": 0.7111111111111111,
755
+ "grad_norm": 1.7385220527648926,
756
+ "learning_rate": 0.00015737404580152672,
757
+ "loss": 2.0197,
758
+ "mean_token_accuracy": 0.63025072067976,
759
+ "num_tokens": 902240.0,
760
+ "step": 800
761
+ },
762
+ {
763
+ "epoch": 0.7111111111111111,
764
+ "eval_loss": 2.0297935009002686,
765
+ "eval_mean_token_accuracy": 0.628437293112278,
766
+ "eval_num_tokens": 902240.0,
767
+ "eval_runtime": 49.3011,
768
+ "eval_samples_per_second": 20.284,
769
+ "eval_steps_per_second": 10.142,
770
+ "step": 800
771
+ },
772
+ {
773
+ "epoch": 0.72,
774
+ "grad_norm": 1.8906656503677368,
775
+ "learning_rate": 0.00015676335877862595,
776
+ "loss": 2.0806,
777
+ "mean_token_accuracy": 0.619849094748497,
778
+ "num_tokens": 914009.0,
779
+ "step": 810
780
+ },
781
+ {
782
+ "epoch": 0.7288888888888889,
783
+ "grad_norm": 1.714268684387207,
784
+ "learning_rate": 0.0001561526717557252,
785
+ "loss": 2.0343,
786
+ "mean_token_accuracy": 0.632188580930233,
787
+ "num_tokens": 925091.0,
788
+ "step": 820
789
+ },
790
+ {
791
+ "epoch": 0.7377777777777778,
792
+ "grad_norm": 1.833918809890747,
793
+ "learning_rate": 0.00015554198473282445,
794
+ "loss": 2.0747,
795
+ "mean_token_accuracy": 0.6280180156230927,
796
+ "num_tokens": 936675.0,
797
+ "step": 830
798
+ },
799
+ {
800
+ "epoch": 0.7466666666666667,
801
+ "grad_norm": 1.9817575216293335,
802
+ "learning_rate": 0.00015493129770992366,
803
+ "loss": 2.0859,
804
+ "mean_token_accuracy": 0.6128378361463547,
805
+ "num_tokens": 948151.0,
806
+ "step": 840
807
+ },
808
+ {
809
+ "epoch": 0.7555555555555555,
810
+ "grad_norm": 1.5982656478881836,
811
+ "learning_rate": 0.0001543206106870229,
812
+ "loss": 2.0455,
813
+ "mean_token_accuracy": 0.6276382938027382,
814
+ "num_tokens": 959266.0,
815
+ "step": 850
816
+ },
817
+ {
818
+ "epoch": 0.7644444444444445,
819
+ "grad_norm": 1.7298970222473145,
820
+ "learning_rate": 0.00015370992366412213,
821
+ "loss": 1.9604,
822
+ "mean_token_accuracy": 0.6377590849995614,
823
+ "num_tokens": 970339.0,
824
+ "step": 860
825
+ },
826
+ {
827
+ "epoch": 0.7733333333333333,
828
+ "grad_norm": 1.8064581155776978,
829
+ "learning_rate": 0.0001530992366412214,
830
+ "loss": 2.0698,
831
+ "mean_token_accuracy": 0.6194617792963981,
832
+ "num_tokens": 981805.0,
833
+ "step": 870
834
+ },
835
+ {
836
+ "epoch": 0.7822222222222223,
837
+ "grad_norm": 1.5860410928726196,
838
+ "learning_rate": 0.00015248854961832063,
839
+ "loss": 2.0182,
840
+ "mean_token_accuracy": 0.6292306095361709,
841
+ "num_tokens": 993552.0,
842
+ "step": 880
843
+ },
844
+ {
845
+ "epoch": 0.7911111111111111,
846
+ "grad_norm": 1.8761259317398071,
847
+ "learning_rate": 0.00015187786259541984,
848
+ "loss": 2.0335,
849
+ "mean_token_accuracy": 0.6285651385784149,
850
+ "num_tokens": 1004400.0,
851
+ "step": 890
852
+ },
853
+ {
854
+ "epoch": 0.8,
855
+ "grad_norm": 1.6973590850830078,
856
+ "learning_rate": 0.00015126717557251908,
857
+ "loss": 2.0927,
858
+ "mean_token_accuracy": 0.6183614790439605,
859
+ "num_tokens": 1015564.0,
860
+ "step": 900
861
+ },
862
+ {
863
+ "epoch": 0.8088888888888889,
864
+ "grad_norm": 1.6477675437927246,
865
+ "learning_rate": 0.00015065648854961834,
866
+ "loss": 1.9187,
867
+ "mean_token_accuracy": 0.6427812784910202,
868
+ "num_tokens": 1026849.0,
869
+ "step": 910
870
+ },
871
+ {
872
+ "epoch": 0.8177777777777778,
873
+ "grad_norm": 1.6942589282989502,
874
+ "learning_rate": 0.00015004580152671757,
875
+ "loss": 2.0139,
876
+ "mean_token_accuracy": 0.6322552219033242,
877
+ "num_tokens": 1037721.0,
878
+ "step": 920
879
+ },
880
+ {
881
+ "epoch": 0.8266666666666667,
882
+ "grad_norm": 1.6394822597503662,
883
+ "learning_rate": 0.0001494351145038168,
884
+ "loss": 2.0392,
885
+ "mean_token_accuracy": 0.6273665294051171,
886
+ "num_tokens": 1048986.0,
887
+ "step": 930
888
+ },
889
+ {
890
+ "epoch": 0.8355555555555556,
891
+ "grad_norm": 1.697804570198059,
892
+ "learning_rate": 0.00014882442748091602,
893
+ "loss": 2.0412,
894
+ "mean_token_accuracy": 0.625536386668682,
895
+ "num_tokens": 1060627.0,
896
+ "step": 940
897
+ },
898
+ {
899
+ "epoch": 0.8444444444444444,
900
+ "grad_norm": 1.8058092594146729,
901
+ "learning_rate": 0.00014821374045801528,
902
+ "loss": 1.9737,
903
+ "mean_token_accuracy": 0.6332821652293206,
904
+ "num_tokens": 1071482.0,
905
+ "step": 950
906
+ },
907
+ {
908
+ "epoch": 0.8533333333333334,
909
+ "grad_norm": 1.773294448852539,
910
+ "learning_rate": 0.00014760305343511452,
911
+ "loss": 2.054,
912
+ "mean_token_accuracy": 0.6256278708577157,
913
+ "num_tokens": 1082672.0,
914
+ "step": 960
915
+ },
916
+ {
917
+ "epoch": 0.8622222222222222,
918
+ "grad_norm": 1.6936707496643066,
919
+ "learning_rate": 0.00014699236641221375,
920
+ "loss": 1.9957,
921
+ "mean_token_accuracy": 0.6333451583981514,
922
+ "num_tokens": 1093493.0,
923
+ "step": 970
924
+ },
925
+ {
926
+ "epoch": 0.8711111111111111,
927
+ "grad_norm": 1.7029008865356445,
928
+ "learning_rate": 0.000146381679389313,
929
+ "loss": 2.0526,
930
+ "mean_token_accuracy": 0.6244132176041604,
931
+ "num_tokens": 1104857.0,
932
+ "step": 980
933
+ },
934
+ {
935
+ "epoch": 0.88,
936
+ "grad_norm": 1.8421082496643066,
937
+ "learning_rate": 0.00014577099236641223,
938
+ "loss": 2.0311,
939
+ "mean_token_accuracy": 0.6236826583743096,
940
+ "num_tokens": 1116131.0,
941
+ "step": 990
942
+ },
943
+ {
944
+ "epoch": 0.8888888888888888,
945
+ "grad_norm": 1.646053433418274,
946
+ "learning_rate": 0.00014516030534351146,
947
+ "loss": 1.9973,
948
+ "mean_token_accuracy": 0.6274659112095833,
949
+ "num_tokens": 1127612.0,
950
+ "step": 1000
951
+ },
952
+ {
953
+ "epoch": 0.8888888888888888,
954
+ "eval_loss": 1.989682674407959,
955
+ "eval_mean_token_accuracy": 0.633990108013153,
956
+ "eval_num_tokens": 1127612.0,
957
+ "eval_runtime": 49.3043,
958
+ "eval_samples_per_second": 20.282,
959
+ "eval_steps_per_second": 10.141,
960
+ "step": 1000
961
+ },
962
+ {
963
+ "epoch": 0.8977777777777778,
964
+ "grad_norm": 1.5941271781921387,
965
+ "learning_rate": 0.0001445496183206107,
966
+ "loss": 2.0579,
967
+ "mean_token_accuracy": 0.6256210282444954,
968
+ "num_tokens": 1138866.0,
969
+ "step": 1010
970
+ },
971
+ {
972
+ "epoch": 0.9066666666666666,
973
+ "grad_norm": 1.7826253175735474,
974
+ "learning_rate": 0.00014393893129770993,
975
+ "loss": 1.9866,
976
+ "mean_token_accuracy": 0.6332772478461266,
977
+ "num_tokens": 1150411.0,
978
+ "step": 1020
979
+ },
980
+ {
981
+ "epoch": 0.9155555555555556,
982
+ "grad_norm": 1.8722221851348877,
983
+ "learning_rate": 0.00014332824427480917,
984
+ "loss": 2.0398,
985
+ "mean_token_accuracy": 0.627329595386982,
986
+ "num_tokens": 1161360.0,
987
+ "step": 1030
988
+ },
989
+ {
990
+ "epoch": 0.9244444444444444,
991
+ "grad_norm": 1.6533294916152954,
992
+ "learning_rate": 0.0001427175572519084,
993
+ "loss": 2.0271,
994
+ "mean_token_accuracy": 0.6259514302015304,
995
+ "num_tokens": 1172683.0,
996
+ "step": 1040
997
+ },
998
+ {
999
+ "epoch": 0.9333333333333333,
1000
+ "grad_norm": 1.5746543407440186,
1001
+ "learning_rate": 0.00014210687022900764,
1002
+ "loss": 1.9634,
1003
+ "mean_token_accuracy": 0.6359310179948807,
1004
+ "num_tokens": 1183277.0,
1005
+ "step": 1050
1006
+ },
1007
+ {
1008
+ "epoch": 0.9422222222222222,
1009
+ "grad_norm": 1.6094276905059814,
1010
+ "learning_rate": 0.00014149618320610688,
1011
+ "loss": 1.9195,
1012
+ "mean_token_accuracy": 0.649330523610115,
1013
+ "num_tokens": 1194160.0,
1014
+ "step": 1060
1015
+ },
1016
+ {
1017
+ "epoch": 0.9511111111111111,
1018
+ "grad_norm": 1.9643882513046265,
1019
+ "learning_rate": 0.0001408854961832061,
1020
+ "loss": 2.0042,
1021
+ "mean_token_accuracy": 0.6356254667043686,
1022
+ "num_tokens": 1205308.0,
1023
+ "step": 1070
1024
+ },
1025
+ {
1026
+ "epoch": 0.96,
1027
+ "grad_norm": 1.8238948583602905,
1028
+ "learning_rate": 0.00014027480916030535,
1029
+ "loss": 1.9172,
1030
+ "mean_token_accuracy": 0.6497033536434174,
1031
+ "num_tokens": 1215760.0,
1032
+ "step": 1080
1033
+ },
1034
+ {
1035
+ "epoch": 0.9688888888888889,
1036
+ "grad_norm": 1.7422380447387695,
1037
+ "learning_rate": 0.00013966412213740458,
1038
+ "loss": 2.0213,
1039
+ "mean_token_accuracy": 0.6309294819831848,
1040
+ "num_tokens": 1226775.0,
1041
+ "step": 1090
1042
+ },
1043
+ {
1044
+ "epoch": 0.9777777777777777,
1045
+ "grad_norm": 1.651795744895935,
1046
+ "learning_rate": 0.00013905343511450382,
1047
+ "loss": 2.033,
1048
+ "mean_token_accuracy": 0.6295390352606773,
1049
+ "num_tokens": 1238191.0,
1050
+ "step": 1100
1051
+ },
1052
+ {
1053
+ "epoch": 0.9866666666666667,
1054
+ "grad_norm": 1.673543095588684,
1055
+ "learning_rate": 0.00013844274809160308,
1056
+ "loss": 2.0085,
1057
+ "mean_token_accuracy": 0.6329691678285598,
1058
+ "num_tokens": 1249561.0,
1059
+ "step": 1110
1060
+ },
1061
+ {
1062
+ "epoch": 0.9955555555555555,
1063
+ "grad_norm": 1.7423163652420044,
1064
+ "learning_rate": 0.0001378320610687023,
1065
+ "loss": 1.9751,
1066
+ "mean_token_accuracy": 0.6307685926556588,
1067
+ "num_tokens": 1260429.0,
1068
+ "step": 1120
1069
+ },
1070
+ {
1071
+ "epoch": 1.0044444444444445,
1072
+ "grad_norm": 1.4878981113433838,
1073
+ "learning_rate": 0.00013722137404580153,
1074
+ "loss": 1.9171,
1075
+ "mean_token_accuracy": 0.644737622141838,
1076
+ "num_tokens": 1271111.0,
1077
+ "step": 1130
1078
+ },
1079
+ {
1080
+ "epoch": 1.0133333333333334,
1081
+ "grad_norm": 1.5343797206878662,
1082
+ "learning_rate": 0.00013661068702290076,
1083
+ "loss": 1.8544,
1084
+ "mean_token_accuracy": 0.6503374725580215,
1085
+ "num_tokens": 1282434.0,
1086
+ "step": 1140
1087
+ },
1088
+ {
1089
+ "epoch": 1.0222222222222221,
1090
+ "grad_norm": 1.5450340509414673,
1091
+ "learning_rate": 0.00013600000000000003,
1092
+ "loss": 1.828,
1093
+ "mean_token_accuracy": 0.6514182686805725,
1094
+ "num_tokens": 1294382.0,
1095
+ "step": 1150
1096
+ },
1097
+ {
1098
+ "epoch": 1.031111111111111,
1099
+ "grad_norm": 1.8313877582550049,
1100
+ "learning_rate": 0.00013538931297709923,
1101
+ "loss": 1.7704,
1102
+ "mean_token_accuracy": 0.6693721905350685,
1103
+ "num_tokens": 1305343.0,
1104
+ "step": 1160
1105
+ },
1106
+ {
1107
+ "epoch": 1.04,
1108
+ "grad_norm": 1.8418430089950562,
1109
+ "learning_rate": 0.00013477862595419847,
1110
+ "loss": 1.7591,
1111
+ "mean_token_accuracy": 0.67226582467556,
1112
+ "num_tokens": 1316558.0,
1113
+ "step": 1170
1114
+ },
1115
+ {
1116
+ "epoch": 1.048888888888889,
1117
+ "grad_norm": 1.6022825241088867,
1118
+ "learning_rate": 0.0001341679389312977,
1119
+ "loss": 1.8048,
1120
+ "mean_token_accuracy": 0.6629651457071304,
1121
+ "num_tokens": 1327938.0,
1122
+ "step": 1180
1123
+ },
1124
+ {
1125
+ "epoch": 1.0577777777777777,
1126
+ "grad_norm": 1.5888707637786865,
1127
+ "learning_rate": 0.00013355725190839697,
1128
+ "loss": 1.773,
1129
+ "mean_token_accuracy": 0.6730352655053139,
1130
+ "num_tokens": 1338732.0,
1131
+ "step": 1190
1132
+ },
1133
+ {
1134
+ "epoch": 1.0666666666666667,
1135
+ "grad_norm": 1.833946943283081,
1136
+ "learning_rate": 0.0001329465648854962,
1137
+ "loss": 1.7887,
1138
+ "mean_token_accuracy": 0.6616317644715309,
1139
+ "num_tokens": 1350096.0,
1140
+ "step": 1200
1141
+ },
1142
+ {
1143
+ "epoch": 1.0666666666666667,
1144
+ "eval_loss": 1.9697085618972778,
1145
+ "eval_mean_token_accuracy": 0.6378205664157868,
1146
+ "eval_num_tokens": 1350096.0,
1147
+ "eval_runtime": 49.9237,
1148
+ "eval_samples_per_second": 20.031,
1149
+ "eval_steps_per_second": 10.015,
1150
+ "step": 1200
1151
+ },
1152
+ {
1153
+ "epoch": 1.0755555555555556,
1154
+ "grad_norm": 1.6338160037994385,
1155
+ "learning_rate": 0.00013233587786259541,
1156
+ "loss": 1.7889,
1157
+ "mean_token_accuracy": 0.6668319672346115,
1158
+ "num_tokens": 1360771.0,
1159
+ "step": 1210
1160
+ },
1161
+ {
1162
+ "epoch": 1.0844444444444445,
1163
+ "grad_norm": 1.8737561702728271,
1164
+ "learning_rate": 0.00013172519083969465,
1165
+ "loss": 1.7997,
1166
+ "mean_token_accuracy": 0.6570939287543297,
1167
+ "num_tokens": 1372450.0,
1168
+ "step": 1220
1169
+ },
1170
+ {
1171
+ "epoch": 1.0933333333333333,
1172
+ "grad_norm": 1.758074402809143,
1173
+ "learning_rate": 0.0001311145038167939,
1174
+ "loss": 1.8457,
1175
+ "mean_token_accuracy": 0.653074924647808,
1176
+ "num_tokens": 1383711.0,
1177
+ "step": 1230
1178
+ },
1179
+ {
1180
+ "epoch": 1.1022222222222222,
1181
+ "grad_norm": 1.839158296585083,
1182
+ "learning_rate": 0.00013050381679389315,
1183
+ "loss": 1.8013,
1184
+ "mean_token_accuracy": 0.6608111187815666,
1185
+ "num_tokens": 1394856.0,
1186
+ "step": 1240
1187
+ },
1188
+ {
1189
+ "epoch": 1.1111111111111112,
1190
+ "grad_norm": 1.733567476272583,
1191
+ "learning_rate": 0.00012989312977099238,
1192
+ "loss": 1.7814,
1193
+ "mean_token_accuracy": 0.6655508041381836,
1194
+ "num_tokens": 1406193.0,
1195
+ "step": 1250
1196
+ },
1197
+ {
1198
+ "epoch": 1.12,
1199
+ "grad_norm": 1.6274900436401367,
1200
+ "learning_rate": 0.0001292824427480916,
1201
+ "loss": 1.858,
1202
+ "mean_token_accuracy": 0.6488608077168465,
1203
+ "num_tokens": 1417607.0,
1204
+ "step": 1260
1205
+ },
1206
+ {
1207
+ "epoch": 1.1288888888888888,
1208
+ "grad_norm": 1.690090537071228,
1209
+ "learning_rate": 0.00012867175572519086,
1210
+ "loss": 1.8256,
1211
+ "mean_token_accuracy": 0.6595686703920365,
1212
+ "num_tokens": 1429073.0,
1213
+ "step": 1270
1214
+ },
1215
+ {
1216
+ "epoch": 1.1377777777777778,
1217
+ "grad_norm": 1.6638071537017822,
1218
+ "learning_rate": 0.0001280610687022901,
1219
+ "loss": 1.8334,
1220
+ "mean_token_accuracy": 0.6580470725893974,
1221
+ "num_tokens": 1440194.0,
1222
+ "step": 1280
1223
+ },
1224
+ {
1225
+ "epoch": 1.1466666666666667,
1226
+ "grad_norm": 1.8339307308197021,
1227
+ "learning_rate": 0.00012745038167938933,
1228
+ "loss": 1.783,
1229
+ "mean_token_accuracy": 0.6632378786802292,
1230
+ "num_tokens": 1451221.0,
1231
+ "step": 1290
1232
+ },
1233
+ {
1234
+ "epoch": 1.1555555555555554,
1235
+ "grad_norm": 1.7621415853500366,
1236
+ "learning_rate": 0.00012683969465648854,
1237
+ "loss": 1.844,
1238
+ "mean_token_accuracy": 0.6506654173135757,
1239
+ "num_tokens": 1462493.0,
1240
+ "step": 1300
1241
+ },
1242
+ {
1243
+ "epoch": 1.1644444444444444,
1244
+ "grad_norm": 1.7811567783355713,
1245
+ "learning_rate": 0.00012622900763358777,
1246
+ "loss": 1.8235,
1247
+ "mean_token_accuracy": 0.6505810797214509,
1248
+ "num_tokens": 1473710.0,
1249
+ "step": 1310
1250
+ },
1251
+ {
1252
+ "epoch": 1.1733333333333333,
1253
+ "grad_norm": 1.9157836437225342,
1254
+ "learning_rate": 0.00012561832061068704,
1255
+ "loss": 1.8885,
1256
+ "mean_token_accuracy": 0.6459546625614166,
1257
+ "num_tokens": 1485215.0,
1258
+ "step": 1320
1259
+ },
1260
+ {
1261
+ "epoch": 1.1822222222222223,
1262
+ "grad_norm": 1.6572569608688354,
1263
+ "learning_rate": 0.00012500763358778627,
1264
+ "loss": 1.813,
1265
+ "mean_token_accuracy": 0.6597578257322312,
1266
+ "num_tokens": 1496371.0,
1267
+ "step": 1330
1268
+ },
1269
+ {
1270
+ "epoch": 1.1911111111111112,
1271
+ "grad_norm": 1.8602449893951416,
1272
+ "learning_rate": 0.0001243969465648855,
1273
+ "loss": 1.8179,
1274
+ "mean_token_accuracy": 0.6519266426563263,
1275
+ "num_tokens": 1508348.0,
1276
+ "step": 1340
1277
+ },
1278
+ {
1279
+ "epoch": 1.2,
1280
+ "grad_norm": 1.8736369609832764,
1281
+ "learning_rate": 0.00012378625954198472,
1282
+ "loss": 1.8029,
1283
+ "mean_token_accuracy": 0.6621162816882133,
1284
+ "num_tokens": 1519322.0,
1285
+ "step": 1350
1286
+ },
1287
+ {
1288
+ "epoch": 1.208888888888889,
1289
+ "grad_norm": 2.026744842529297,
1290
+ "learning_rate": 0.00012317557251908398,
1291
+ "loss": 1.8168,
1292
+ "mean_token_accuracy": 0.6635635286569596,
1293
+ "num_tokens": 1530183.0,
1294
+ "step": 1360
1295
+ },
1296
+ {
1297
+ "epoch": 1.2177777777777778,
1298
+ "grad_norm": 1.7360782623291016,
1299
+ "learning_rate": 0.00012256488549618322,
1300
+ "loss": 1.7521,
1301
+ "mean_token_accuracy": 0.6706348299980164,
1302
+ "num_tokens": 1540862.0,
1303
+ "step": 1370
1304
+ },
1305
+ {
1306
+ "epoch": 1.2266666666666666,
1307
+ "grad_norm": 1.9620578289031982,
1308
+ "learning_rate": 0.00012195419847328244,
1309
+ "loss": 1.8228,
1310
+ "mean_token_accuracy": 0.6569086670875549,
1311
+ "num_tokens": 1552212.0,
1312
+ "step": 1380
1313
+ },
1314
+ {
1315
+ "epoch": 1.2355555555555555,
1316
+ "grad_norm": 1.6294327974319458,
1317
+ "learning_rate": 0.00012134351145038167,
1318
+ "loss": 1.7654,
1319
+ "mean_token_accuracy": 0.6697377026081085,
1320
+ "num_tokens": 1563356.0,
1321
+ "step": 1390
1322
+ },
1323
+ {
1324
+ "epoch": 1.2444444444444445,
1325
+ "grad_norm": 1.7311524152755737,
1326
+ "learning_rate": 0.00012073282442748092,
1327
+ "loss": 1.9019,
1328
+ "mean_token_accuracy": 0.6457875579595566,
1329
+ "num_tokens": 1574569.0,
1330
+ "step": 1400
1331
+ },
1332
+ {
1333
+ "epoch": 1.2444444444444445,
1334
+ "eval_loss": 1.9411770105361938,
1335
+ "eval_mean_token_accuracy": 0.6407178282737732,
1336
+ "eval_num_tokens": 1574569.0,
1337
+ "eval_runtime": 48.3309,
1338
+ "eval_samples_per_second": 20.691,
1339
+ "eval_steps_per_second": 10.345,
1340
+ "step": 1400
1341
+ },
1342
+ {
1343
+ "epoch": 1.2533333333333334,
1344
+ "grad_norm": 1.8629728555679321,
1345
+ "learning_rate": 0.00012012213740458016,
1346
+ "loss": 1.7585,
1347
+ "mean_token_accuracy": 0.671015702188015,
1348
+ "num_tokens": 1585308.0,
1349
+ "step": 1410
1350
+ },
1351
+ {
1352
+ "epoch": 1.2622222222222224,
1353
+ "grad_norm": 1.958808183670044,
1354
+ "learning_rate": 0.0001195114503816794,
1355
+ "loss": 1.8479,
1356
+ "mean_token_accuracy": 0.6535898372530937,
1357
+ "num_tokens": 1596886.0,
1358
+ "step": 1420
1359
+ },
1360
+ {
1361
+ "epoch": 1.271111111111111,
1362
+ "grad_norm": 1.950421690940857,
1363
+ "learning_rate": 0.00011890076335877862,
1364
+ "loss": 1.8173,
1365
+ "mean_token_accuracy": 0.6655478686094284,
1366
+ "num_tokens": 1607683.0,
1367
+ "step": 1430
1368
+ },
1369
+ {
1370
+ "epoch": 1.28,
1371
+ "grad_norm": 1.8152872323989868,
1372
+ "learning_rate": 0.00011829007633587788,
1373
+ "loss": 1.8791,
1374
+ "mean_token_accuracy": 0.6531546950340271,
1375
+ "num_tokens": 1618906.0,
1376
+ "step": 1440
1377
+ },
1378
+ {
1379
+ "epoch": 1.2888888888888888,
1380
+ "grad_norm": 1.7857719659805298,
1381
+ "learning_rate": 0.0001176793893129771,
1382
+ "loss": 1.7887,
1383
+ "mean_token_accuracy": 0.6610255971550941,
1384
+ "num_tokens": 1629981.0,
1385
+ "step": 1450
1386
+ },
1387
+ {
1388
+ "epoch": 1.2977777777777777,
1389
+ "grad_norm": 1.8434971570968628,
1390
+ "learning_rate": 0.00011706870229007634,
1391
+ "loss": 1.8368,
1392
+ "mean_token_accuracy": 0.653369964659214,
1393
+ "num_tokens": 1641429.0,
1394
+ "step": 1460
1395
+ },
1396
+ {
1397
+ "epoch": 1.3066666666666666,
1398
+ "grad_norm": 1.8877320289611816,
1399
+ "learning_rate": 0.00011645801526717557,
1400
+ "loss": 1.7938,
1401
+ "mean_token_accuracy": 0.6639183640480042,
1402
+ "num_tokens": 1652601.0,
1403
+ "step": 1470
1404
+ },
1405
+ {
1406
+ "epoch": 1.3155555555555556,
1407
+ "grad_norm": 1.8121625185012817,
1408
+ "learning_rate": 0.00011584732824427482,
1409
+ "loss": 1.7862,
1410
+ "mean_token_accuracy": 0.661414910852909,
1411
+ "num_tokens": 1663837.0,
1412
+ "step": 1480
1413
+ },
1414
+ {
1415
+ "epoch": 1.3244444444444445,
1416
+ "grad_norm": 1.7919855117797852,
1417
+ "learning_rate": 0.00011523664122137406,
1418
+ "loss": 1.8148,
1419
+ "mean_token_accuracy": 0.6654411420226097,
1420
+ "num_tokens": 1675018.0,
1421
+ "step": 1490
1422
+ },
1423
+ {
1424
+ "epoch": 1.3333333333333333,
1425
+ "grad_norm": 1.828735589981079,
1426
+ "learning_rate": 0.00011462595419847328,
1427
+ "loss": 1.8456,
1428
+ "mean_token_accuracy": 0.6496043875813484,
1429
+ "num_tokens": 1686136.0,
1430
+ "step": 1500
1431
+ },
1432
+ {
1433
+ "epoch": 1.3422222222222222,
1434
+ "grad_norm": 1.9462794065475464,
1435
+ "learning_rate": 0.00011401526717557252,
1436
+ "loss": 1.8412,
1437
+ "mean_token_accuracy": 0.6603908941149712,
1438
+ "num_tokens": 1697160.0,
1439
+ "step": 1510
1440
+ },
1441
+ {
1442
+ "epoch": 1.3511111111111112,
1443
+ "grad_norm": 1.6794313192367554,
1444
+ "learning_rate": 0.00011340458015267177,
1445
+ "loss": 1.7774,
1446
+ "mean_token_accuracy": 0.6664682924747467,
1447
+ "num_tokens": 1707831.0,
1448
+ "step": 1520
1449
+ },
1450
+ {
1451
+ "epoch": 1.3599999999999999,
1452
+ "grad_norm": 1.8189337253570557,
1453
+ "learning_rate": 0.000112793893129771,
1454
+ "loss": 1.8031,
1455
+ "mean_token_accuracy": 0.6627006307244301,
1456
+ "num_tokens": 1719074.0,
1457
+ "step": 1530
1458
+ },
1459
+ {
1460
+ "epoch": 1.3688888888888888,
1461
+ "grad_norm": 2.073533296585083,
1462
+ "learning_rate": 0.00011218320610687022,
1463
+ "loss": 1.8657,
1464
+ "mean_token_accuracy": 0.6476830393075943,
1465
+ "num_tokens": 1730388.0,
1466
+ "step": 1540
1467
+ },
1468
+ {
1469
+ "epoch": 1.3777777777777778,
1470
+ "grad_norm": 2.1564207077026367,
1471
+ "learning_rate": 0.00011157251908396946,
1472
+ "loss": 1.8261,
1473
+ "mean_token_accuracy": 0.6567840203642845,
1474
+ "num_tokens": 1741806.0,
1475
+ "step": 1550
1476
+ },
1477
+ {
1478
+ "epoch": 1.3866666666666667,
1479
+ "grad_norm": 1.6113232374191284,
1480
+ "learning_rate": 0.00011096183206106871,
1481
+ "loss": 1.7753,
1482
+ "mean_token_accuracy": 0.6659888163208961,
1483
+ "num_tokens": 1753313.0,
1484
+ "step": 1560
1485
+ },
1486
+ {
1487
+ "epoch": 1.3955555555555557,
1488
+ "grad_norm": 1.8112174272537231,
1489
+ "learning_rate": 0.00011035114503816795,
1490
+ "loss": 1.8046,
1491
+ "mean_token_accuracy": 0.6593015149235726,
1492
+ "num_tokens": 1765144.0,
1493
+ "step": 1570
1494
+ },
1495
+ {
1496
+ "epoch": 1.4044444444444444,
1497
+ "grad_norm": 1.8377541303634644,
1498
+ "learning_rate": 0.00010974045801526718,
1499
+ "loss": 1.8848,
1500
+ "mean_token_accuracy": 0.6533517614006996,
1501
+ "num_tokens": 1776783.0,
1502
+ "step": 1580
1503
+ },
1504
+ {
1505
+ "epoch": 1.4133333333333333,
1506
+ "grad_norm": 1.8384325504302979,
1507
+ "learning_rate": 0.0001091297709923664,
1508
+ "loss": 1.7669,
1509
+ "mean_token_accuracy": 0.6613995045423507,
1510
+ "num_tokens": 1788274.0,
1511
+ "step": 1590
1512
+ },
1513
+ {
1514
+ "epoch": 1.4222222222222223,
1515
+ "grad_norm": 1.8124533891677856,
1516
+ "learning_rate": 0.00010851908396946567,
1517
+ "loss": 1.8164,
1518
+ "mean_token_accuracy": 0.6591159239411354,
1519
+ "num_tokens": 1799707.0,
1520
+ "step": 1600
1521
+ },
1522
+ {
1523
+ "epoch": 1.4222222222222223,
1524
+ "eval_loss": 1.9286668300628662,
1525
+ "eval_mean_token_accuracy": 0.6434953879117966,
1526
+ "eval_num_tokens": 1799707.0,
1527
+ "eval_runtime": 48.6198,
1528
+ "eval_samples_per_second": 20.568,
1529
+ "eval_steps_per_second": 10.284,
1530
+ "step": 1600
1531
+ },
1532
+ {
1533
+ "epoch": 1.431111111111111,
1534
+ "grad_norm": 1.6931661367416382,
1535
+ "learning_rate": 0.00010790839694656489,
1536
+ "loss": 1.7548,
1537
+ "mean_token_accuracy": 0.664087076485157,
1538
+ "num_tokens": 1810865.0,
1539
+ "step": 1610
1540
+ },
1541
+ {
1542
+ "epoch": 1.44,
1543
+ "grad_norm": 1.7501254081726074,
1544
+ "learning_rate": 0.00010729770992366413,
1545
+ "loss": 1.7652,
1546
+ "mean_token_accuracy": 0.6640020117163659,
1547
+ "num_tokens": 1821807.0,
1548
+ "step": 1620
1549
+ },
1550
+ {
1551
+ "epoch": 1.448888888888889,
1552
+ "grad_norm": 1.8411732912063599,
1553
+ "learning_rate": 0.00010668702290076336,
1554
+ "loss": 1.831,
1555
+ "mean_token_accuracy": 0.6564242169260979,
1556
+ "num_tokens": 1832886.0,
1557
+ "step": 1630
1558
+ },
1559
+ {
1560
+ "epoch": 1.4577777777777778,
1561
+ "grad_norm": 2.003892183303833,
1562
+ "learning_rate": 0.00010607633587786261,
1563
+ "loss": 1.7791,
1564
+ "mean_token_accuracy": 0.6632592365145683,
1565
+ "num_tokens": 1843989.0,
1566
+ "step": 1640
1567
+ },
1568
+ {
1569
+ "epoch": 1.4666666666666668,
1570
+ "grad_norm": 1.7987340688705444,
1571
+ "learning_rate": 0.00010546564885496185,
1572
+ "loss": 1.7627,
1573
+ "mean_token_accuracy": 0.6713873609900475,
1574
+ "num_tokens": 1855106.0,
1575
+ "step": 1650
1576
+ },
1577
+ {
1578
+ "epoch": 1.4755555555555555,
1579
+ "grad_norm": 1.931877851486206,
1580
+ "learning_rate": 0.00010485496183206107,
1581
+ "loss": 1.7976,
1582
+ "mean_token_accuracy": 0.6631382897496223,
1583
+ "num_tokens": 1866900.0,
1584
+ "step": 1660
1585
+ },
1586
+ {
1587
+ "epoch": 1.4844444444444445,
1588
+ "grad_norm": 1.7883687019348145,
1589
+ "learning_rate": 0.0001042442748091603,
1590
+ "loss": 1.7671,
1591
+ "mean_token_accuracy": 0.6675158813595772,
1592
+ "num_tokens": 1877911.0,
1593
+ "step": 1670
1594
+ },
1595
+ {
1596
+ "epoch": 1.4933333333333334,
1597
+ "grad_norm": 1.8195563554763794,
1598
+ "learning_rate": 0.00010363358778625955,
1599
+ "loss": 1.8346,
1600
+ "mean_token_accuracy": 0.652577318251133,
1601
+ "num_tokens": 1889580.0,
1602
+ "step": 1680
1603
+ },
1604
+ {
1605
+ "epoch": 1.5022222222222221,
1606
+ "grad_norm": 1.7439149618148804,
1607
+ "learning_rate": 0.00010302290076335879,
1608
+ "loss": 1.7476,
1609
+ "mean_token_accuracy": 0.6717594474554062,
1610
+ "num_tokens": 1901133.0,
1611
+ "step": 1690
1612
+ },
1613
+ {
1614
+ "epoch": 1.511111111111111,
1615
+ "grad_norm": 1.8155314922332764,
1616
+ "learning_rate": 0.00010241221374045801,
1617
+ "loss": 1.8044,
1618
+ "mean_token_accuracy": 0.6617274522781372,
1619
+ "num_tokens": 1911796.0,
1620
+ "step": 1700
1621
+ },
1622
+ {
1623
+ "epoch": 1.52,
1624
+ "grad_norm": 1.7685112953186035,
1625
+ "learning_rate": 0.00010180152671755725,
1626
+ "loss": 1.7727,
1627
+ "mean_token_accuracy": 0.665304908156395,
1628
+ "num_tokens": 1923217.0,
1629
+ "step": 1710
1630
+ },
1631
+ {
1632
+ "epoch": 1.528888888888889,
1633
+ "grad_norm": 1.737053632736206,
1634
+ "learning_rate": 0.0001011908396946565,
1635
+ "loss": 1.8345,
1636
+ "mean_token_accuracy": 0.6577870160341263,
1637
+ "num_tokens": 1934355.0,
1638
+ "step": 1720
1639
+ },
1640
+ {
1641
+ "epoch": 1.537777777777778,
1642
+ "grad_norm": 1.9686291217803955,
1643
+ "learning_rate": 0.00010058015267175573,
1644
+ "loss": 1.8165,
1645
+ "mean_token_accuracy": 0.6594037398695946,
1646
+ "num_tokens": 1945653.0,
1647
+ "step": 1730
1648
+ },
1649
+ {
1650
+ "epoch": 1.5466666666666666,
1651
+ "grad_norm": 1.844651699066162,
1652
+ "learning_rate": 9.996946564885497e-05,
1653
+ "loss": 1.8273,
1654
+ "mean_token_accuracy": 0.6566928923130035,
1655
+ "num_tokens": 1956891.0,
1656
+ "step": 1740
1657
+ },
1658
+ {
1659
+ "epoch": 1.5555555555555556,
1660
+ "grad_norm": 1.8607743978500366,
1661
+ "learning_rate": 9.93587786259542e-05,
1662
+ "loss": 1.785,
1663
+ "mean_token_accuracy": 0.6692357853055,
1664
+ "num_tokens": 1967789.0,
1665
+ "step": 1750
1666
+ },
1667
+ {
1668
+ "epoch": 1.5644444444444443,
1669
+ "grad_norm": 1.9204373359680176,
1670
+ "learning_rate": 9.874809160305344e-05,
1671
+ "loss": 1.8264,
1672
+ "mean_token_accuracy": 0.6549209818243981,
1673
+ "num_tokens": 1979224.0,
1674
+ "step": 1760
1675
+ },
1676
+ {
1677
+ "epoch": 1.5733333333333333,
1678
+ "grad_norm": 1.7754265069961548,
1679
+ "learning_rate": 9.813740458015268e-05,
1680
+ "loss": 1.7467,
1681
+ "mean_token_accuracy": 0.6670090600848197,
1682
+ "num_tokens": 1990255.0,
1683
+ "step": 1770
1684
+ },
1685
+ {
1686
+ "epoch": 1.5822222222222222,
1687
+ "grad_norm": 2.069091796875,
1688
+ "learning_rate": 9.752671755725191e-05,
1689
+ "loss": 1.7731,
1690
+ "mean_token_accuracy": 0.6609751120209694,
1691
+ "num_tokens": 2001606.0,
1692
+ "step": 1780
1693
+ },
1694
+ {
1695
+ "epoch": 1.5911111111111111,
1696
+ "grad_norm": 2.1375646591186523,
1697
+ "learning_rate": 9.691603053435115e-05,
1698
+ "loss": 1.8009,
1699
+ "mean_token_accuracy": 0.6624869346618653,
1700
+ "num_tokens": 2012912.0,
1701
+ "step": 1790
1702
+ },
1703
+ {
1704
+ "epoch": 1.6,
1705
+ "grad_norm": 1.5623434782028198,
1706
+ "learning_rate": 9.630534351145038e-05,
1707
+ "loss": 1.7383,
1708
+ "mean_token_accuracy": 0.6694582119584084,
1709
+ "num_tokens": 2024571.0,
1710
+ "step": 1800
1711
+ },
1712
+ {
1713
+ "epoch": 1.6,
1714
+ "eval_loss": 1.90510892868042,
1715
+ "eval_mean_token_accuracy": 0.6464553346633911,
1716
+ "eval_num_tokens": 2024571.0,
1717
+ "eval_runtime": 48.9449,
1718
+ "eval_samples_per_second": 20.431,
1719
+ "eval_steps_per_second": 10.216,
1720
+ "step": 1800
1721
+ },
1722
+ {
1723
+ "epoch": 1.608888888888889,
1724
+ "grad_norm": 1.745969295501709,
1725
+ "learning_rate": 9.569465648854963e-05,
1726
+ "loss": 1.7552,
1727
+ "mean_token_accuracy": 0.6786300778388977,
1728
+ "num_tokens": 2035783.0,
1729
+ "step": 1810
1730
+ },
1731
+ {
1732
+ "epoch": 1.6177777777777778,
1733
+ "grad_norm": 1.7463303804397583,
1734
+ "learning_rate": 9.508396946564886e-05,
1735
+ "loss": 1.7495,
1736
+ "mean_token_accuracy": 0.6666959136724472,
1737
+ "num_tokens": 2047304.0,
1738
+ "step": 1820
1739
+ },
1740
+ {
1741
+ "epoch": 1.6266666666666667,
1742
+ "grad_norm": 1.9058139324188232,
1743
+ "learning_rate": 9.44732824427481e-05,
1744
+ "loss": 1.8365,
1745
+ "mean_token_accuracy": 0.6536470741033554,
1746
+ "num_tokens": 2058792.0,
1747
+ "step": 1830
1748
+ },
1749
+ {
1750
+ "epoch": 1.6355555555555554,
1751
+ "grad_norm": 2.065488576889038,
1752
+ "learning_rate": 9.386259541984733e-05,
1753
+ "loss": 1.7939,
1754
+ "mean_token_accuracy": 0.6519258007407188,
1755
+ "num_tokens": 2070175.0,
1756
+ "step": 1840
1757
+ },
1758
+ {
1759
+ "epoch": 1.6444444444444444,
1760
+ "grad_norm": 1.778023600578308,
1761
+ "learning_rate": 9.325190839694658e-05,
1762
+ "loss": 1.8155,
1763
+ "mean_token_accuracy": 0.655296416580677,
1764
+ "num_tokens": 2081343.0,
1765
+ "step": 1850
1766
+ },
1767
+ {
1768
+ "epoch": 1.6533333333333333,
1769
+ "grad_norm": 1.7437517642974854,
1770
+ "learning_rate": 9.26412213740458e-05,
1771
+ "loss": 1.7996,
1772
+ "mean_token_accuracy": 0.6618543311953544,
1773
+ "num_tokens": 2093074.0,
1774
+ "step": 1860
1775
+ },
1776
+ {
1777
+ "epoch": 1.6622222222222223,
1778
+ "grad_norm": 1.7666471004486084,
1779
+ "learning_rate": 9.203053435114505e-05,
1780
+ "loss": 1.7658,
1781
+ "mean_token_accuracy": 0.6631957843899727,
1782
+ "num_tokens": 2104640.0,
1783
+ "step": 1870
1784
+ },
1785
+ {
1786
+ "epoch": 1.6711111111111112,
1787
+ "grad_norm": 1.912842869758606,
1788
+ "learning_rate": 9.141984732824428e-05,
1789
+ "loss": 1.7996,
1790
+ "mean_token_accuracy": 0.6606781020760536,
1791
+ "num_tokens": 2115628.0,
1792
+ "step": 1880
1793
+ },
1794
+ {
1795
+ "epoch": 1.6800000000000002,
1796
+ "grad_norm": 1.7230331897735596,
1797
+ "learning_rate": 9.080916030534351e-05,
1798
+ "loss": 1.8042,
1799
+ "mean_token_accuracy": 0.6600380197167397,
1800
+ "num_tokens": 2126505.0,
1801
+ "step": 1890
1802
+ },
1803
+ {
1804
+ "epoch": 1.6888888888888889,
1805
+ "grad_norm": 1.7043401002883911,
1806
+ "learning_rate": 9.019847328244276e-05,
1807
+ "loss": 1.7993,
1808
+ "mean_token_accuracy": 0.6613149493932724,
1809
+ "num_tokens": 2138364.0,
1810
+ "step": 1900
1811
+ },
1812
+ {
1813
+ "epoch": 1.6977777777777778,
1814
+ "grad_norm": 1.9145572185516357,
1815
+ "learning_rate": 8.958778625954198e-05,
1816
+ "loss": 1.8046,
1817
+ "mean_token_accuracy": 0.662477345764637,
1818
+ "num_tokens": 2149425.0,
1819
+ "step": 1910
1820
+ },
1821
+ {
1822
+ "epoch": 1.7066666666666666,
1823
+ "grad_norm": 1.7448140382766724,
1824
+ "learning_rate": 8.897709923664123e-05,
1825
+ "loss": 1.8004,
1826
+ "mean_token_accuracy": 0.6539181426167489,
1827
+ "num_tokens": 2160843.0,
1828
+ "step": 1920
1829
+ },
1830
+ {
1831
+ "epoch": 1.7155555555555555,
1832
+ "grad_norm": 1.8304840326309204,
1833
+ "learning_rate": 8.836641221374045e-05,
1834
+ "loss": 1.8404,
1835
+ "mean_token_accuracy": 0.6593489304184914,
1836
+ "num_tokens": 2172044.0,
1837
+ "step": 1930
1838
+ },
1839
+ {
1840
+ "epoch": 1.7244444444444444,
1841
+ "grad_norm": 1.802331566810608,
1842
+ "learning_rate": 8.77557251908397e-05,
1843
+ "loss": 1.7995,
1844
+ "mean_token_accuracy": 0.6634193584322929,
1845
+ "num_tokens": 2182916.0,
1846
+ "step": 1940
1847
+ },
1848
+ {
1849
+ "epoch": 1.7333333333333334,
1850
+ "grad_norm": 1.9834682941436768,
1851
+ "learning_rate": 8.714503816793894e-05,
1852
+ "loss": 1.7525,
1853
+ "mean_token_accuracy": 0.6685526207089424,
1854
+ "num_tokens": 2194913.0,
1855
+ "step": 1950
1856
+ },
1857
+ {
1858
+ "epoch": 1.7422222222222223,
1859
+ "grad_norm": 1.8077235221862793,
1860
+ "learning_rate": 8.653435114503817e-05,
1861
+ "loss": 1.7612,
1862
+ "mean_token_accuracy": 0.6704939991235733,
1863
+ "num_tokens": 2205721.0,
1864
+ "step": 1960
1865
+ },
1866
+ {
1867
+ "epoch": 1.751111111111111,
1868
+ "grad_norm": 1.957993745803833,
1869
+ "learning_rate": 8.592366412213741e-05,
1870
+ "loss": 1.8059,
1871
+ "mean_token_accuracy": 0.6547697961330414,
1872
+ "num_tokens": 2217489.0,
1873
+ "step": 1970
1874
+ },
1875
+ {
1876
+ "epoch": 1.76,
1877
+ "grad_norm": 1.7215981483459473,
1878
+ "learning_rate": 8.531297709923664e-05,
1879
+ "loss": 1.7913,
1880
+ "mean_token_accuracy": 0.657075221836567,
1881
+ "num_tokens": 2228972.0,
1882
+ "step": 1980
1883
+ },
1884
+ {
1885
+ "epoch": 1.7688888888888887,
1886
+ "grad_norm": 1.8760231733322144,
1887
+ "learning_rate": 8.470229007633588e-05,
1888
+ "loss": 1.7923,
1889
+ "mean_token_accuracy": 0.6629065066576004,
1890
+ "num_tokens": 2240239.0,
1891
+ "step": 1990
1892
+ },
1893
+ {
1894
+ "epoch": 1.7777777777777777,
1895
+ "grad_norm": 2.092407703399658,
1896
+ "learning_rate": 8.409160305343512e-05,
1897
+ "loss": 1.7593,
1898
+ "mean_token_accuracy": 0.6686230883002281,
1899
+ "num_tokens": 2251436.0,
1900
+ "step": 2000
1901
+ },
1902
+ {
1903
+ "epoch": 1.7777777777777777,
1904
+ "eval_loss": 1.893255591392517,
1905
+ "eval_mean_token_accuracy": 0.6482590944766998,
1906
+ "eval_num_tokens": 2251436.0,
1907
+ "eval_runtime": 49.0676,
1908
+ "eval_samples_per_second": 20.38,
1909
+ "eval_steps_per_second": 10.19,
1910
+ "step": 2000
1911
+ },
1912
+ {
1913
+ "epoch": 1.7866666666666666,
1914
+ "grad_norm": 1.7836107015609741,
1915
+ "learning_rate": 8.348091603053435e-05,
1916
+ "loss": 1.8033,
1917
+ "mean_token_accuracy": 0.6598399996757507,
1918
+ "num_tokens": 2263069.0,
1919
+ "step": 2010
1920
+ },
1921
+ {
1922
+ "epoch": 1.7955555555555556,
1923
+ "grad_norm": 1.7955141067504883,
1924
+ "learning_rate": 8.287022900763359e-05,
1925
+ "loss": 1.7922,
1926
+ "mean_token_accuracy": 0.6619856491684913,
1927
+ "num_tokens": 2274050.0,
1928
+ "step": 2020
1929
+ },
1930
+ {
1931
+ "epoch": 1.8044444444444445,
1932
+ "grad_norm": 1.7887564897537231,
1933
+ "learning_rate": 8.225954198473282e-05,
1934
+ "loss": 1.8353,
1935
+ "mean_token_accuracy": 0.658150726556778,
1936
+ "num_tokens": 2285060.0,
1937
+ "step": 2030
1938
+ },
1939
+ {
1940
+ "epoch": 1.8133333333333335,
1941
+ "grad_norm": 1.8892567157745361,
1942
+ "learning_rate": 8.164885496183207e-05,
1943
+ "loss": 1.7266,
1944
+ "mean_token_accuracy": 0.6728688895702362,
1945
+ "num_tokens": 2296211.0,
1946
+ "step": 2040
1947
+ },
1948
+ {
1949
+ "epoch": 1.8222222222222222,
1950
+ "grad_norm": 1.9226106405258179,
1951
+ "learning_rate": 8.10381679389313e-05,
1952
+ "loss": 1.7243,
1953
+ "mean_token_accuracy": 0.6712497785687447,
1954
+ "num_tokens": 2307184.0,
1955
+ "step": 2050
1956
+ },
1957
+ {
1958
+ "epoch": 1.8311111111111111,
1959
+ "grad_norm": 1.735863208770752,
1960
+ "learning_rate": 8.042748091603054e-05,
1961
+ "loss": 1.7739,
1962
+ "mean_token_accuracy": 0.6621047109365463,
1963
+ "num_tokens": 2318602.0,
1964
+ "step": 2060
1965
+ },
1966
+ {
1967
+ "epoch": 1.8399999999999999,
1968
+ "grad_norm": 1.8361355066299438,
1969
+ "learning_rate": 7.981679389312977e-05,
1970
+ "loss": 1.8223,
1971
+ "mean_token_accuracy": 0.6560095950961113,
1972
+ "num_tokens": 2330193.0,
1973
+ "step": 2070
1974
+ },
1975
+ {
1976
+ "epoch": 1.8488888888888888,
1977
+ "grad_norm": 1.8159486055374146,
1978
+ "learning_rate": 7.920610687022902e-05,
1979
+ "loss": 1.7695,
1980
+ "mean_token_accuracy": 0.6657541528344154,
1981
+ "num_tokens": 2341442.0,
1982
+ "step": 2080
1983
+ },
1984
+ {
1985
+ "epoch": 1.8577777777777778,
1986
+ "grad_norm": 1.9189419746398926,
1987
+ "learning_rate": 7.859541984732824e-05,
1988
+ "loss": 1.8333,
1989
+ "mean_token_accuracy": 0.6628425523638726,
1990
+ "num_tokens": 2352479.0,
1991
+ "step": 2090
1992
+ },
1993
+ {
1994
+ "epoch": 1.8666666666666667,
1995
+ "grad_norm": 1.8809512853622437,
1996
+ "learning_rate": 7.798473282442749e-05,
1997
+ "loss": 1.7371,
1998
+ "mean_token_accuracy": 0.6683435723185539,
1999
+ "num_tokens": 2363642.0,
2000
+ "step": 2100
2001
+ },
2002
+ {
2003
+ "epoch": 1.8755555555555556,
2004
+ "grad_norm": 1.845886468887329,
2005
+ "learning_rate": 7.737404580152672e-05,
2006
+ "loss": 1.7774,
2007
+ "mean_token_accuracy": 0.6559944331645966,
2008
+ "num_tokens": 2375376.0,
2009
+ "step": 2110
2010
+ },
2011
+ {
2012
+ "epoch": 1.8844444444444446,
2013
+ "grad_norm": 1.7780894041061401,
2014
+ "learning_rate": 7.676335877862596e-05,
2015
+ "loss": 1.7823,
2016
+ "mean_token_accuracy": 0.6601730152964592,
2017
+ "num_tokens": 2386944.0,
2018
+ "step": 2120
2019
+ },
2020
+ {
2021
+ "epoch": 1.8933333333333333,
2022
+ "grad_norm": 1.9167022705078125,
2023
+ "learning_rate": 7.61526717557252e-05,
2024
+ "loss": 1.7869,
2025
+ "mean_token_accuracy": 0.6573449537158013,
2026
+ "num_tokens": 2398391.0,
2027
+ "step": 2130
2028
+ },
2029
+ {
2030
+ "epoch": 1.9022222222222223,
2031
+ "grad_norm": 2.037911891937256,
2032
+ "learning_rate": 7.554198473282443e-05,
2033
+ "loss": 1.7858,
2034
+ "mean_token_accuracy": 0.6593190267682075,
2035
+ "num_tokens": 2409837.0,
2036
+ "step": 2140
2037
+ },
2038
+ {
2039
+ "epoch": 1.911111111111111,
2040
+ "grad_norm": 1.7496647834777832,
2041
+ "learning_rate": 7.493129770992367e-05,
2042
+ "loss": 1.7241,
2043
+ "mean_token_accuracy": 0.6702290028333664,
2044
+ "num_tokens": 2421607.0,
2045
+ "step": 2150
2046
+ },
2047
+ {
2048
+ "epoch": 1.92,
2049
+ "grad_norm": 2.0227596759796143,
2050
+ "learning_rate": 7.43206106870229e-05,
2051
+ "loss": 1.7731,
2052
+ "mean_token_accuracy": 0.6679618924856185,
2053
+ "num_tokens": 2432376.0,
2054
+ "step": 2160
2055
+ },
2056
+ {
2057
+ "epoch": 1.9288888888888889,
2058
+ "grad_norm": 1.7401562929153442,
2059
+ "learning_rate": 7.370992366412214e-05,
2060
+ "loss": 1.7684,
2061
+ "mean_token_accuracy": 0.6676609605550766,
2062
+ "num_tokens": 2443683.0,
2063
+ "step": 2170
2064
+ },
2065
+ {
2066
+ "epoch": 1.9377777777777778,
2067
+ "grad_norm": 2.709106922149658,
2068
+ "learning_rate": 7.309923664122137e-05,
2069
+ "loss": 1.709,
2070
+ "mean_token_accuracy": 0.6738818466663361,
2071
+ "num_tokens": 2454757.0,
2072
+ "step": 2180
2073
+ },
2074
+ {
2075
+ "epoch": 1.9466666666666668,
2076
+ "grad_norm": 1.8504191637039185,
2077
+ "learning_rate": 7.248854961832061e-05,
2078
+ "loss": 1.7411,
2079
+ "mean_token_accuracy": 0.6681609645485878,
2080
+ "num_tokens": 2465562.0,
2081
+ "step": 2190
2082
+ },
2083
+ {
2084
+ "epoch": 1.9555555555555557,
2085
+ "grad_norm": 1.9488162994384766,
2086
+ "learning_rate": 7.187786259541986e-05,
2087
+ "loss": 1.7927,
2088
+ "mean_token_accuracy": 0.6587553441524505,
2089
+ "num_tokens": 2476869.0,
2090
+ "step": 2200
2091
+ },
2092
+ {
2093
+ "epoch": 1.9555555555555557,
2094
+ "eval_loss": 1.8803235292434692,
2095
+ "eval_mean_token_accuracy": 0.6499251070022583,
2096
+ "eval_num_tokens": 2476869.0,
2097
+ "eval_runtime": 47.7648,
2098
+ "eval_samples_per_second": 20.936,
2099
+ "eval_steps_per_second": 10.468,
2100
+ "step": 2200
2101
+ },
2102
+ {
2103
+ "epoch": 1.9644444444444444,
2104
+ "grad_norm": 1.9747337102890015,
2105
+ "learning_rate": 7.132824427480917e-05,
2106
+ "loss": 1.7689,
2107
+ "mean_token_accuracy": 0.666295376420021,
2108
+ "num_tokens": 2487704.0,
2109
+ "step": 2210
2110
+ },
2111
+ {
2112
+ "epoch": 1.9733333333333334,
2113
+ "grad_norm": 1.8904316425323486,
2114
+ "learning_rate": 7.071755725190839e-05,
2115
+ "loss": 1.7538,
2116
+ "mean_token_accuracy": 0.6645636394619941,
2117
+ "num_tokens": 2498918.0,
2118
+ "step": 2220
2119
+ },
2120
+ {
2121
+ "epoch": 1.982222222222222,
2122
+ "grad_norm": 1.8791844844818115,
2123
+ "learning_rate": 7.010687022900764e-05,
2124
+ "loss": 1.7926,
2125
+ "mean_token_accuracy": 0.6631673067808151,
2126
+ "num_tokens": 2509728.0,
2127
+ "step": 2230
2128
+ },
2129
+ {
2130
+ "epoch": 1.991111111111111,
2131
+ "grad_norm": 1.9756606817245483,
2132
+ "learning_rate": 6.949618320610687e-05,
2133
+ "loss": 1.7863,
2134
+ "mean_token_accuracy": 0.6628521859645844,
2135
+ "num_tokens": 2521073.0,
2136
+ "step": 2240
2137
+ },
2138
+ {
2139
+ "epoch": 2.0,
2140
+ "grad_norm": 1.7894699573516846,
2141
+ "learning_rate": 6.888549618320611e-05,
2142
+ "loss": 1.7539,
2143
+ "mean_token_accuracy": 0.6728802308440208,
2144
+ "num_tokens": 2531820.0,
2145
+ "step": 2250
2146
+ },
2147
+ {
2148
+ "epoch": 2.008888888888889,
2149
+ "grad_norm": 1.702850341796875,
2150
+ "learning_rate": 6.827480916030535e-05,
2151
+ "loss": 1.4903,
2152
+ "mean_token_accuracy": 0.7138098135590554,
2153
+ "num_tokens": 2542512.0,
2154
+ "step": 2260
2155
+ },
2156
+ {
2157
+ "epoch": 2.017777777777778,
2158
+ "grad_norm": 1.7931528091430664,
2159
+ "learning_rate": 6.766412213740458e-05,
2160
+ "loss": 1.601,
2161
+ "mean_token_accuracy": 0.6894692406058311,
2162
+ "num_tokens": 2553338.0,
2163
+ "step": 2270
2164
+ },
2165
+ {
2166
+ "epoch": 2.026666666666667,
2167
+ "grad_norm": 2.228480339050293,
2168
+ "learning_rate": 6.705343511450382e-05,
2169
+ "loss": 1.609,
2170
+ "mean_token_accuracy": 0.6943154886364937,
2171
+ "num_tokens": 2564182.0,
2172
+ "step": 2280
2173
+ },
2174
+ {
2175
+ "epoch": 2.0355555555555553,
2176
+ "grad_norm": 1.9658042192459106,
2177
+ "learning_rate": 6.644274809160305e-05,
2178
+ "loss": 1.6545,
2179
+ "mean_token_accuracy": 0.6824306204915047,
2180
+ "num_tokens": 2575789.0,
2181
+ "step": 2290
2182
+ },
2183
+ {
2184
+ "epoch": 2.0444444444444443,
2185
+ "grad_norm": 1.7540594339370728,
2186
+ "learning_rate": 6.583206106870229e-05,
2187
+ "loss": 1.6229,
2188
+ "mean_token_accuracy": 0.6881745710968972,
2189
+ "num_tokens": 2587147.0,
2190
+ "step": 2300
2191
+ },
2192
+ {
2193
+ "epoch": 2.0533333333333332,
2194
+ "grad_norm": 1.799501895904541,
2195
+ "learning_rate": 6.522137404580153e-05,
2196
+ "loss": 1.6119,
2197
+ "mean_token_accuracy": 0.6896049126982688,
2198
+ "num_tokens": 2598282.0,
2199
+ "step": 2310
2200
+ },
2201
+ {
2202
+ "epoch": 2.062222222222222,
2203
+ "grad_norm": 1.7720867395401,
2204
+ "learning_rate": 6.461068702290076e-05,
2205
+ "loss": 1.5519,
2206
+ "mean_token_accuracy": 0.7038252353668213,
2207
+ "num_tokens": 2609125.0,
2208
+ "step": 2320
2209
+ },
2210
+ {
2211
+ "epoch": 2.071111111111111,
2212
+ "grad_norm": 1.994992971420288,
2213
+ "learning_rate": 6.400000000000001e-05,
2214
+ "loss": 1.5872,
2215
+ "mean_token_accuracy": 0.690100908279419,
2216
+ "num_tokens": 2620411.0,
2217
+ "step": 2330
2218
+ },
2219
+ {
2220
+ "epoch": 2.08,
2221
+ "grad_norm": 1.9283640384674072,
2222
+ "learning_rate": 6.338931297709923e-05,
2223
+ "loss": 1.5867,
2224
+ "mean_token_accuracy": 0.6923216238617897,
2225
+ "num_tokens": 2631795.0,
2226
+ "step": 2340
2227
+ },
2228
+ {
2229
+ "epoch": 2.088888888888889,
2230
+ "grad_norm": 1.9957973957061768,
2231
+ "learning_rate": 6.277862595419848e-05,
2232
+ "loss": 1.5996,
2233
+ "mean_token_accuracy": 0.6924369186162949,
2234
+ "num_tokens": 2643179.0,
2235
+ "step": 2350
2236
+ },
2237
+ {
2238
+ "epoch": 2.097777777777778,
2239
+ "grad_norm": 2.0207560062408447,
2240
+ "learning_rate": 6.21679389312977e-05,
2241
+ "loss": 1.515,
2242
+ "mean_token_accuracy": 0.7066755428910255,
2243
+ "num_tokens": 2654206.0,
2244
+ "step": 2360
2245
+ },
2246
+ {
2247
+ "epoch": 2.1066666666666665,
2248
+ "grad_norm": 1.8871878385543823,
2249
+ "learning_rate": 6.155725190839695e-05,
2250
+ "loss": 1.6139,
2251
+ "mean_token_accuracy": 0.687422800064087,
2252
+ "num_tokens": 2665582.0,
2253
+ "step": 2370
2254
+ },
2255
+ {
2256
+ "epoch": 2.1155555555555554,
2257
+ "grad_norm": 1.717610478401184,
2258
+ "learning_rate": 6.094656488549618e-05,
2259
+ "loss": 1.6388,
2260
+ "mean_token_accuracy": 0.6870575189590454,
2261
+ "num_tokens": 2677533.0,
2262
+ "step": 2380
2263
+ },
2264
+ {
2265
+ "epoch": 2.1244444444444444,
2266
+ "grad_norm": 1.8574187755584717,
2267
+ "learning_rate": 6.0335877862595426e-05,
2268
+ "loss": 1.557,
2269
+ "mean_token_accuracy": 0.6999430671334267,
2270
+ "num_tokens": 2688755.0,
2271
+ "step": 2390
2272
+ },
2273
+ {
2274
+ "epoch": 2.1333333333333333,
2275
+ "grad_norm": 1.9739580154418945,
2276
+ "learning_rate": 5.9725190839694655e-05,
2277
+ "loss": 1.6553,
2278
+ "mean_token_accuracy": 0.6819543272256852,
2279
+ "num_tokens": 2700558.0,
2280
+ "step": 2400
2281
+ },
2282
+ {
2283
+ "epoch": 2.1333333333333333,
2284
+ "eval_loss": 1.8970768451690674,
2285
+ "eval_mean_token_accuracy": 0.6490416256189346,
2286
+ "eval_num_tokens": 2700558.0,
2287
+ "eval_runtime": 47.6704,
2288
+ "eval_samples_per_second": 20.977,
2289
+ "eval_steps_per_second": 10.489,
2290
+ "step": 2400
2291
+ },
2292
+ {
2293
+ "epoch": 2.1422222222222222,
2294
+ "grad_norm": 1.893918514251709,
2295
+ "learning_rate": 5.91145038167939e-05,
2296
+ "loss": 1.5459,
2297
+ "mean_token_accuracy": 0.6963777393102646,
2298
+ "num_tokens": 2711713.0,
2299
+ "step": 2410
2300
+ },
2301
+ {
2302
+ "epoch": 2.151111111111111,
2303
+ "grad_norm": 1.9607445001602173,
2304
+ "learning_rate": 5.850381679389313e-05,
2305
+ "loss": 1.6373,
2306
+ "mean_token_accuracy": 0.6815788432955742,
2307
+ "num_tokens": 2723686.0,
2308
+ "step": 2420
2309
+ },
2310
+ {
2311
+ "epoch": 2.16,
2312
+ "grad_norm": 2.091732978820801,
2313
+ "learning_rate": 5.789312977099237e-05,
2314
+ "loss": 1.6422,
2315
+ "mean_token_accuracy": 0.6811213716864586,
2316
+ "num_tokens": 2735300.0,
2317
+ "step": 2430
2318
+ },
2319
+ {
2320
+ "epoch": 2.168888888888889,
2321
+ "grad_norm": 2.1138076782226562,
2322
+ "learning_rate": 5.7282442748091605e-05,
2323
+ "loss": 1.5848,
2324
+ "mean_token_accuracy": 0.6962573245167732,
2325
+ "num_tokens": 2746248.0,
2326
+ "step": 2440
2327
+ },
2328
+ {
2329
+ "epoch": 2.1777777777777776,
2330
+ "grad_norm": 2.1495392322540283,
2331
+ "learning_rate": 5.667175572519085e-05,
2332
+ "loss": 1.576,
2333
+ "mean_token_accuracy": 0.6990228727459907,
2334
+ "num_tokens": 2757259.0,
2335
+ "step": 2450
2336
+ },
2337
+ {
2338
+ "epoch": 2.1866666666666665,
2339
+ "grad_norm": 2.1444251537323,
2340
+ "learning_rate": 5.606106870229008e-05,
2341
+ "loss": 1.5979,
2342
+ "mean_token_accuracy": 0.6916472837328911,
2343
+ "num_tokens": 2768228.0,
2344
+ "step": 2460
2345
+ },
2346
+ {
2347
+ "epoch": 2.1955555555555555,
2348
+ "grad_norm": 1.945489525794983,
2349
+ "learning_rate": 5.545038167938932e-05,
2350
+ "loss": 1.5663,
2351
+ "mean_token_accuracy": 0.7005513325333595,
2352
+ "num_tokens": 2779254.0,
2353
+ "step": 2470
2354
+ },
2355
+ {
2356
+ "epoch": 2.2044444444444444,
2357
+ "grad_norm": 1.8256646394729614,
2358
+ "learning_rate": 5.483969465648855e-05,
2359
+ "loss": 1.5751,
2360
+ "mean_token_accuracy": 0.6961624413728714,
2361
+ "num_tokens": 2790326.0,
2362
+ "step": 2480
2363
+ },
2364
+ {
2365
+ "epoch": 2.2133333333333334,
2366
+ "grad_norm": 1.9541441202163696,
2367
+ "learning_rate": 5.422900763358779e-05,
2368
+ "loss": 1.6268,
2369
+ "mean_token_accuracy": 0.6893054991960526,
2370
+ "num_tokens": 2801625.0,
2371
+ "step": 2490
2372
+ },
2373
+ {
2374
+ "epoch": 2.2222222222222223,
2375
+ "grad_norm": 2.0127615928649902,
2376
+ "learning_rate": 5.361832061068702e-05,
2377
+ "loss": 1.6096,
2378
+ "mean_token_accuracy": 0.6923437744379044,
2379
+ "num_tokens": 2813010.0,
2380
+ "step": 2500
2381
+ },
2382
+ {
2383
+ "epoch": 2.2311111111111113,
2384
+ "grad_norm": 2.0325839519500732,
2385
+ "learning_rate": 5.300763358778626e-05,
2386
+ "loss": 1.5963,
2387
+ "mean_token_accuracy": 0.6913090571761131,
2388
+ "num_tokens": 2824021.0,
2389
+ "step": 2510
2390
+ },
2391
+ {
2392
+ "epoch": 2.24,
2393
+ "grad_norm": 2.1595821380615234,
2394
+ "learning_rate": 5.23969465648855e-05,
2395
+ "loss": 1.5617,
2396
+ "mean_token_accuracy": 0.7037980020046234,
2397
+ "num_tokens": 2835232.0,
2398
+ "step": 2520
2399
+ },
2400
+ {
2401
+ "epoch": 2.2488888888888887,
2402
+ "grad_norm": 2.11661958694458,
2403
+ "learning_rate": 5.178625954198474e-05,
2404
+ "loss": 1.6213,
2405
+ "mean_token_accuracy": 0.6836483731865883,
2406
+ "num_tokens": 2846524.0,
2407
+ "step": 2530
2408
+ },
2409
+ {
2410
+ "epoch": 2.2577777777777777,
2411
+ "grad_norm": 1.88747239112854,
2412
+ "learning_rate": 5.117557251908397e-05,
2413
+ "loss": 1.6408,
2414
+ "mean_token_accuracy": 0.6860729962587356,
2415
+ "num_tokens": 2857788.0,
2416
+ "step": 2540
2417
+ },
2418
+ {
2419
+ "epoch": 2.2666666666666666,
2420
+ "grad_norm": 1.9622093439102173,
2421
+ "learning_rate": 5.056488549618321e-05,
2422
+ "loss": 1.5519,
2423
+ "mean_token_accuracy": 0.7002682030200958,
2424
+ "num_tokens": 2868618.0,
2425
+ "step": 2550
2426
+ },
2427
+ {
2428
+ "epoch": 2.2755555555555556,
2429
+ "grad_norm": 1.9343371391296387,
2430
+ "learning_rate": 4.995419847328244e-05,
2431
+ "loss": 1.5795,
2432
+ "mean_token_accuracy": 0.6934511423110962,
2433
+ "num_tokens": 2879999.0,
2434
+ "step": 2560
2435
+ },
2436
+ {
2437
+ "epoch": 2.2844444444444445,
2438
+ "grad_norm": 1.9991627931594849,
2439
+ "learning_rate": 4.934351145038168e-05,
2440
+ "loss": 1.6183,
2441
+ "mean_token_accuracy": 0.6901679039001465,
2442
+ "num_tokens": 2891053.0,
2443
+ "step": 2570
2444
+ },
2445
+ {
2446
+ "epoch": 2.2933333333333334,
2447
+ "grad_norm": 1.9480003118515015,
2448
+ "learning_rate": 4.8732824427480914e-05,
2449
+ "loss": 1.5826,
2450
+ "mean_token_accuracy": 0.7007558569312096,
2451
+ "num_tokens": 2901905.0,
2452
+ "step": 2580
2453
+ },
2454
+ {
2455
+ "epoch": 2.3022222222222224,
2456
+ "grad_norm": 2.021207332611084,
2457
+ "learning_rate": 4.812213740458015e-05,
2458
+ "loss": 1.6348,
2459
+ "mean_token_accuracy": 0.6848765298724174,
2460
+ "num_tokens": 2913571.0,
2461
+ "step": 2590
2462
+ },
2463
+ {
2464
+ "epoch": 2.311111111111111,
2465
+ "grad_norm": 1.8385164737701416,
2466
+ "learning_rate": 4.751145038167939e-05,
2467
+ "loss": 1.5763,
2468
+ "mean_token_accuracy": 0.6912240386009216,
2469
+ "num_tokens": 2925533.0,
2470
+ "step": 2600
2471
+ },
2472
+ {
2473
+ "epoch": 2.311111111111111,
2474
+ "eval_loss": 1.8940143585205078,
2475
+ "eval_mean_token_accuracy": 0.6499911918640137,
2476
+ "eval_num_tokens": 2925533.0,
2477
+ "eval_runtime": 47.456,
2478
+ "eval_samples_per_second": 21.072,
2479
+ "eval_steps_per_second": 10.536,
2480
+ "step": 2600
2481
+ },
2482
+ {
2483
+ "epoch": 2.32,
2484
+ "grad_norm": 1.9455375671386719,
2485
+ "learning_rate": 4.690076335877863e-05,
2486
+ "loss": 1.598,
2487
+ "mean_token_accuracy": 0.6915700435638428,
2488
+ "num_tokens": 2936620.0,
2489
+ "step": 2610
2490
+ },
2491
+ {
2492
+ "epoch": 2.328888888888889,
2493
+ "grad_norm": 1.863487720489502,
2494
+ "learning_rate": 4.6290076335877864e-05,
2495
+ "loss": 1.5512,
2496
+ "mean_token_accuracy": 0.7025073647499085,
2497
+ "num_tokens": 2947753.0,
2498
+ "step": 2620
2499
+ },
2500
+ {
2501
+ "epoch": 2.3377777777777777,
2502
+ "grad_norm": 1.9756685495376587,
2503
+ "learning_rate": 4.56793893129771e-05,
2504
+ "loss": 1.5973,
2505
+ "mean_token_accuracy": 0.6870647758245468,
2506
+ "num_tokens": 2959635.0,
2507
+ "step": 2630
2508
+ },
2509
+ {
2510
+ "epoch": 2.3466666666666667,
2511
+ "grad_norm": 2.190765142440796,
2512
+ "learning_rate": 4.5068702290076336e-05,
2513
+ "loss": 1.5948,
2514
+ "mean_token_accuracy": 0.6888303905725479,
2515
+ "num_tokens": 2971675.0,
2516
+ "step": 2640
2517
+ },
2518
+ {
2519
+ "epoch": 2.3555555555555556,
2520
+ "grad_norm": 1.827318787574768,
2521
+ "learning_rate": 4.445801526717557e-05,
2522
+ "loss": 1.5682,
2523
+ "mean_token_accuracy": 0.6952902913093567,
2524
+ "num_tokens": 2982744.0,
2525
+ "step": 2650
2526
+ },
2527
+ {
2528
+ "epoch": 2.3644444444444446,
2529
+ "grad_norm": 2.11799693107605,
2530
+ "learning_rate": 4.384732824427481e-05,
2531
+ "loss": 1.6221,
2532
+ "mean_token_accuracy": 0.6794109031558037,
2533
+ "num_tokens": 2994347.0,
2534
+ "step": 2660
2535
+ },
2536
+ {
2537
+ "epoch": 2.3733333333333335,
2538
+ "grad_norm": 2.1472220420837402,
2539
+ "learning_rate": 4.3236641221374044e-05,
2540
+ "loss": 1.6353,
2541
+ "mean_token_accuracy": 0.6876759916543961,
2542
+ "num_tokens": 3005174.0,
2543
+ "step": 2670
2544
+ },
2545
+ {
2546
+ "epoch": 2.3822222222222225,
2547
+ "grad_norm": 1.9971054792404175,
2548
+ "learning_rate": 4.2625954198473286e-05,
2549
+ "loss": 1.5372,
2550
+ "mean_token_accuracy": 0.7059834420680999,
2551
+ "num_tokens": 3016492.0,
2552
+ "step": 2680
2553
+ },
2554
+ {
2555
+ "epoch": 2.391111111111111,
2556
+ "grad_norm": 2.067861318588257,
2557
+ "learning_rate": 4.201526717557252e-05,
2558
+ "loss": 1.572,
2559
+ "mean_token_accuracy": 0.6911077201366425,
2560
+ "num_tokens": 3027826.0,
2561
+ "step": 2690
2562
+ },
2563
+ {
2564
+ "epoch": 2.4,
2565
+ "grad_norm": 2.0372536182403564,
2566
+ "learning_rate": 4.140458015267176e-05,
2567
+ "loss": 1.5615,
2568
+ "mean_token_accuracy": 0.6972797185182571,
2569
+ "num_tokens": 3038770.0,
2570
+ "step": 2700
2571
+ },
2572
+ {
2573
+ "epoch": 2.408888888888889,
2574
+ "grad_norm": 2.15972638130188,
2575
+ "learning_rate": 4.0793893129770994e-05,
2576
+ "loss": 1.5806,
2577
+ "mean_token_accuracy": 0.6947444006800652,
2578
+ "num_tokens": 3050159.0,
2579
+ "step": 2710
2580
+ },
2581
+ {
2582
+ "epoch": 2.417777777777778,
2583
+ "grad_norm": 2.059760808944702,
2584
+ "learning_rate": 4.018320610687023e-05,
2585
+ "loss": 1.6167,
2586
+ "mean_token_accuracy": 0.6882677704095841,
2587
+ "num_tokens": 3061009.0,
2588
+ "step": 2720
2589
+ },
2590
+ {
2591
+ "epoch": 2.4266666666666667,
2592
+ "grad_norm": 1.9914629459381104,
2593
+ "learning_rate": 3.9572519083969466e-05,
2594
+ "loss": 1.5508,
2595
+ "mean_token_accuracy": 0.6985371947288513,
2596
+ "num_tokens": 3072232.0,
2597
+ "step": 2730
2598
+ },
2599
+ {
2600
+ "epoch": 2.4355555555555557,
2601
+ "grad_norm": 2.0151119232177734,
2602
+ "learning_rate": 3.89618320610687e-05,
2603
+ "loss": 1.663,
2604
+ "mean_token_accuracy": 0.6849021047353745,
2605
+ "num_tokens": 3083939.0,
2606
+ "step": 2740
2607
+ },
2608
+ {
2609
+ "epoch": 2.4444444444444446,
2610
+ "grad_norm": 2.02457332611084,
2611
+ "learning_rate": 3.835114503816794e-05,
2612
+ "loss": 1.6043,
2613
+ "mean_token_accuracy": 0.6891427770256996,
2614
+ "num_tokens": 3095354.0,
2615
+ "step": 2750
2616
+ },
2617
+ {
2618
+ "epoch": 2.453333333333333,
2619
+ "grad_norm": 1.930341362953186,
2620
+ "learning_rate": 3.774045801526718e-05,
2621
+ "loss": 1.5648,
2622
+ "mean_token_accuracy": 0.6962095096707344,
2623
+ "num_tokens": 3106679.0,
2624
+ "step": 2760
2625
+ },
2626
+ {
2627
+ "epoch": 2.462222222222222,
2628
+ "grad_norm": 2.1718850135803223,
2629
+ "learning_rate": 3.7129770992366416e-05,
2630
+ "loss": 1.5514,
2631
+ "mean_token_accuracy": 0.6997211873531342,
2632
+ "num_tokens": 3117440.0,
2633
+ "step": 2770
2634
+ },
2635
+ {
2636
+ "epoch": 2.471111111111111,
2637
+ "grad_norm": 1.89506196975708,
2638
+ "learning_rate": 3.651908396946565e-05,
2639
+ "loss": 1.6102,
2640
+ "mean_token_accuracy": 0.6865462198853493,
2641
+ "num_tokens": 3128685.0,
2642
+ "step": 2780
2643
+ },
2644
+ {
2645
+ "epoch": 2.48,
2646
+ "grad_norm": 2.1102652549743652,
2647
+ "learning_rate": 3.590839694656489e-05,
2648
+ "loss": 1.6092,
2649
+ "mean_token_accuracy": 0.6845578849315643,
2650
+ "num_tokens": 3140574.0,
2651
+ "step": 2790
2652
+ },
2653
+ {
2654
+ "epoch": 2.488888888888889,
2655
+ "grad_norm": 1.9541523456573486,
2656
+ "learning_rate": 3.5297709923664124e-05,
2657
+ "loss": 1.6245,
2658
+ "mean_token_accuracy": 0.6867643877863884,
2659
+ "num_tokens": 3151937.0,
2660
+ "step": 2800
2661
+ },
2662
+ {
2663
+ "epoch": 2.488888888888889,
2664
+ "eval_loss": 1.8869248628616333,
2665
+ "eval_mean_token_accuracy": 0.6508636207580566,
2666
+ "eval_num_tokens": 3151937.0,
2667
+ "eval_runtime": 46.9872,
2668
+ "eval_samples_per_second": 21.282,
2669
+ "eval_steps_per_second": 10.641,
2670
+ "step": 2800
2671
+ },
2672
+ {
2673
+ "epoch": 2.497777777777778,
2674
+ "grad_norm": 2.006448984146118,
2675
+ "learning_rate": 3.468702290076336e-05,
2676
+ "loss": 1.6458,
2677
+ "mean_token_accuracy": 0.6835160732269288,
2678
+ "num_tokens": 3163343.0,
2679
+ "step": 2810
2680
+ },
2681
+ {
2682
+ "epoch": 2.506666666666667,
2683
+ "grad_norm": 2.0644562244415283,
2684
+ "learning_rate": 3.4076335877862595e-05,
2685
+ "loss": 1.5841,
2686
+ "mean_token_accuracy": 0.699130979180336,
2687
+ "num_tokens": 3174278.0,
2688
+ "step": 2820
2689
+ },
2690
+ {
2691
+ "epoch": 2.5155555555555553,
2692
+ "grad_norm": 2.5352766513824463,
2693
+ "learning_rate": 3.346564885496183e-05,
2694
+ "loss": 1.6411,
2695
+ "mean_token_accuracy": 0.687686163187027,
2696
+ "num_tokens": 3185529.0,
2697
+ "step": 2830
2698
+ },
2699
+ {
2700
+ "epoch": 2.5244444444444447,
2701
+ "grad_norm": 2.2506706714630127,
2702
+ "learning_rate": 3.2854961832061074e-05,
2703
+ "loss": 1.5334,
2704
+ "mean_token_accuracy": 0.7042266175150871,
2705
+ "num_tokens": 3196422.0,
2706
+ "step": 2840
2707
+ },
2708
+ {
2709
+ "epoch": 2.533333333333333,
2710
+ "grad_norm": 2.038456439971924,
2711
+ "learning_rate": 3.224427480916031e-05,
2712
+ "loss": 1.5226,
2713
+ "mean_token_accuracy": 0.7002356797456741,
2714
+ "num_tokens": 3207640.0,
2715
+ "step": 2850
2716
+ },
2717
+ {
2718
+ "epoch": 2.542222222222222,
2719
+ "grad_norm": 2.0818448066711426,
2720
+ "learning_rate": 3.1633587786259545e-05,
2721
+ "loss": 1.5136,
2722
+ "mean_token_accuracy": 0.7040936380624772,
2723
+ "num_tokens": 3218742.0,
2724
+ "step": 2860
2725
+ },
2726
+ {
2727
+ "epoch": 2.551111111111111,
2728
+ "grad_norm": 1.9810820817947388,
2729
+ "learning_rate": 3.102290076335878e-05,
2730
+ "loss": 1.6515,
2731
+ "mean_token_accuracy": 0.6826088905334473,
2732
+ "num_tokens": 3230062.0,
2733
+ "step": 2870
2734
+ },
2735
+ {
2736
+ "epoch": 2.56,
2737
+ "grad_norm": 2.1830689907073975,
2738
+ "learning_rate": 3.0412213740458017e-05,
2739
+ "loss": 1.5792,
2740
+ "mean_token_accuracy": 0.699496129155159,
2741
+ "num_tokens": 3240533.0,
2742
+ "step": 2880
2743
+ },
2744
+ {
2745
+ "epoch": 2.568888888888889,
2746
+ "grad_norm": 2.101184368133545,
2747
+ "learning_rate": 2.9801526717557253e-05,
2748
+ "loss": 1.6538,
2749
+ "mean_token_accuracy": 0.6724523141980171,
2750
+ "num_tokens": 3252476.0,
2751
+ "step": 2890
2752
+ },
2753
+ {
2754
+ "epoch": 2.5777777777777775,
2755
+ "grad_norm": 2.021524429321289,
2756
+ "learning_rate": 2.9190839694656492e-05,
2757
+ "loss": 1.6146,
2758
+ "mean_token_accuracy": 0.6886414483189582,
2759
+ "num_tokens": 3263799.0,
2760
+ "step": 2900
2761
+ },
2762
+ {
2763
+ "epoch": 2.586666666666667,
2764
+ "grad_norm": 1.9668735265731812,
2765
+ "learning_rate": 2.8580152671755728e-05,
2766
+ "loss": 1.6477,
2767
+ "mean_token_accuracy": 0.678925508260727,
2768
+ "num_tokens": 3275511.0,
2769
+ "step": 2910
2770
+ },
2771
+ {
2772
+ "epoch": 2.5955555555555554,
2773
+ "grad_norm": 2.088491201400757,
2774
+ "learning_rate": 2.7969465648854964e-05,
2775
+ "loss": 1.6265,
2776
+ "mean_token_accuracy": 0.6857595339417457,
2777
+ "num_tokens": 3286752.0,
2778
+ "step": 2920
2779
+ },
2780
+ {
2781
+ "epoch": 2.6044444444444443,
2782
+ "grad_norm": 2.0536880493164062,
2783
+ "learning_rate": 2.73587786259542e-05,
2784
+ "loss": 1.66,
2785
+ "mean_token_accuracy": 0.681273227930069,
2786
+ "num_tokens": 3297945.0,
2787
+ "step": 2930
2788
+ },
2789
+ {
2790
+ "epoch": 2.6133333333333333,
2791
+ "grad_norm": 2.0063817501068115,
2792
+ "learning_rate": 2.674809160305344e-05,
2793
+ "loss": 1.5102,
2794
+ "mean_token_accuracy": 0.7025244757533073,
2795
+ "num_tokens": 3309112.0,
2796
+ "step": 2940
2797
+ },
2798
+ {
2799
+ "epoch": 2.6222222222222222,
2800
+ "grad_norm": 1.9980206489562988,
2801
+ "learning_rate": 2.6137404580152675e-05,
2802
+ "loss": 1.5142,
2803
+ "mean_token_accuracy": 0.7049572348594666,
2804
+ "num_tokens": 3320544.0,
2805
+ "step": 2950
2806
+ },
2807
+ {
2808
+ "epoch": 2.631111111111111,
2809
+ "grad_norm": 2.1506435871124268,
2810
+ "learning_rate": 2.552671755725191e-05,
2811
+ "loss": 1.5826,
2812
+ "mean_token_accuracy": 0.694467018544674,
2813
+ "num_tokens": 3331309.0,
2814
+ "step": 2960
2815
+ },
2816
+ {
2817
+ "epoch": 2.64,
2818
+ "grad_norm": 1.9890793561935425,
2819
+ "learning_rate": 2.4916030534351147e-05,
2820
+ "loss": 1.5631,
2821
+ "mean_token_accuracy": 0.6945617944002151,
2822
+ "num_tokens": 3343068.0,
2823
+ "step": 2970
2824
+ },
2825
+ {
2826
+ "epoch": 2.648888888888889,
2827
+ "grad_norm": 2.1102676391601562,
2828
+ "learning_rate": 2.4305343511450383e-05,
2829
+ "loss": 1.6145,
2830
+ "mean_token_accuracy": 0.6866093754768372,
2831
+ "num_tokens": 3354691.0,
2832
+ "step": 2980
2833
+ },
2834
+ {
2835
+ "epoch": 2.6577777777777776,
2836
+ "grad_norm": 2.2881674766540527,
2837
+ "learning_rate": 2.369465648854962e-05,
2838
+ "loss": 1.5796,
2839
+ "mean_token_accuracy": 0.6961612686514854,
2840
+ "num_tokens": 3365512.0,
2841
+ "step": 2990
2842
+ },
2843
+ {
2844
+ "epoch": 2.6666666666666665,
2845
+ "grad_norm": 1.973838210105896,
2846
+ "learning_rate": 2.3083969465648854e-05,
2847
+ "loss": 1.5456,
2848
+ "mean_token_accuracy": 0.703473174571991,
2849
+ "num_tokens": 3376406.0,
2850
+ "step": 3000
2851
+ },
2852
+ {
2853
+ "epoch": 2.6666666666666665,
2854
+ "eval_loss": 1.881131649017334,
2855
+ "eval_mean_token_accuracy": 0.6518214672803879,
2856
+ "eval_num_tokens": 3376406.0,
2857
+ "eval_runtime": 47.794,
2858
+ "eval_samples_per_second": 20.923,
2859
+ "eval_steps_per_second": 10.462,
2860
+ "step": 3000
2861
+ },
2862
+ {
2863
+ "epoch": 2.6755555555555555,
2864
+ "grad_norm": 1.9779133796691895,
2865
+ "learning_rate": 2.2473282442748094e-05,
2866
+ "loss": 1.6538,
2867
+ "mean_token_accuracy": 0.6778925880789757,
2868
+ "num_tokens": 3388024.0,
2869
+ "step": 3010
2870
+ },
2871
+ {
2872
+ "epoch": 2.6844444444444444,
2873
+ "grad_norm": 1.848136305809021,
2874
+ "learning_rate": 2.186259541984733e-05,
2875
+ "loss": 1.5608,
2876
+ "mean_token_accuracy": 0.6985713213682174,
2877
+ "num_tokens": 3399547.0,
2878
+ "step": 3020
2879
+ },
2880
+ {
2881
+ "epoch": 2.6933333333333334,
2882
+ "grad_norm": 2.101651191711426,
2883
+ "learning_rate": 2.1251908396946565e-05,
2884
+ "loss": 1.5501,
2885
+ "mean_token_accuracy": 0.6979974433779716,
2886
+ "num_tokens": 3410179.0,
2887
+ "step": 3030
2888
+ },
2889
+ {
2890
+ "epoch": 2.7022222222222223,
2891
+ "grad_norm": 1.8398933410644531,
2892
+ "learning_rate": 2.06412213740458e-05,
2893
+ "loss": 1.5843,
2894
+ "mean_token_accuracy": 0.6883544474840164,
2895
+ "num_tokens": 3421454.0,
2896
+ "step": 3040
2897
+ },
2898
+ {
2899
+ "epoch": 2.7111111111111112,
2900
+ "grad_norm": 2.011132001876831,
2901
+ "learning_rate": 2.003053435114504e-05,
2902
+ "loss": 1.6012,
2903
+ "mean_token_accuracy": 0.6917843446135521,
2904
+ "num_tokens": 3432951.0,
2905
+ "step": 3050
2906
+ },
2907
+ {
2908
+ "epoch": 2.7199999999999998,
2909
+ "grad_norm": 2.005140542984009,
2910
+ "learning_rate": 1.9419847328244276e-05,
2911
+ "loss": 1.5421,
2912
+ "mean_token_accuracy": 0.6976893007755279,
2913
+ "num_tokens": 3444007.0,
2914
+ "step": 3060
2915
+ },
2916
+ {
2917
+ "epoch": 2.728888888888889,
2918
+ "grad_norm": 2.146664619445801,
2919
+ "learning_rate": 1.8809160305343512e-05,
2920
+ "loss": 1.5799,
2921
+ "mean_token_accuracy": 0.6956974431872368,
2922
+ "num_tokens": 3455510.0,
2923
+ "step": 3070
2924
+ },
2925
+ {
2926
+ "epoch": 2.7377777777777776,
2927
+ "grad_norm": 2.0788283348083496,
2928
+ "learning_rate": 1.8198473282442748e-05,
2929
+ "loss": 1.6043,
2930
+ "mean_token_accuracy": 0.6913327068090439,
2931
+ "num_tokens": 3466684.0,
2932
+ "step": 3080
2933
+ },
2934
+ {
2935
+ "epoch": 2.7466666666666666,
2936
+ "grad_norm": 1.8829123973846436,
2937
+ "learning_rate": 1.7587786259541984e-05,
2938
+ "loss": 1.5649,
2939
+ "mean_token_accuracy": 0.6947105377912521,
2940
+ "num_tokens": 3477804.0,
2941
+ "step": 3090
2942
+ },
2943
+ {
2944
+ "epoch": 2.7555555555555555,
2945
+ "grad_norm": 1.9475817680358887,
2946
+ "learning_rate": 1.6977099236641223e-05,
2947
+ "loss": 1.5568,
2948
+ "mean_token_accuracy": 0.7034636497497558,
2949
+ "num_tokens": 3488846.0,
2950
+ "step": 3100
2951
+ },
2952
+ {
2953
+ "epoch": 2.7644444444444445,
2954
+ "grad_norm": 2.098478317260742,
2955
+ "learning_rate": 1.636641221374046e-05,
2956
+ "loss": 1.5575,
2957
+ "mean_token_accuracy": 0.7053634539246559,
2958
+ "num_tokens": 3499405.0,
2959
+ "step": 3110
2960
+ },
2961
+ {
2962
+ "epoch": 2.7733333333333334,
2963
+ "grad_norm": 2.041572093963623,
2964
+ "learning_rate": 1.5755725190839695e-05,
2965
+ "loss": 1.619,
2966
+ "mean_token_accuracy": 0.6887963160872459,
2967
+ "num_tokens": 3511004.0,
2968
+ "step": 3120
2969
+ },
2970
+ {
2971
+ "epoch": 2.7822222222222224,
2972
+ "grad_norm": 2.0892608165740967,
2973
+ "learning_rate": 1.5145038167938933e-05,
2974
+ "loss": 1.55,
2975
+ "mean_token_accuracy": 0.6963776037096977,
2976
+ "num_tokens": 3521755.0,
2977
+ "step": 3130
2978
+ },
2979
+ {
2980
+ "epoch": 2.7911111111111113,
2981
+ "grad_norm": 1.9754984378814697,
2982
+ "learning_rate": 1.4534351145038168e-05,
2983
+ "loss": 1.5459,
2984
+ "mean_token_accuracy": 0.7077917411923409,
2985
+ "num_tokens": 3532621.0,
2986
+ "step": 3140
2987
+ },
2988
+ {
2989
+ "epoch": 2.8,
2990
+ "grad_norm": 1.9490447044372559,
2991
+ "learning_rate": 1.3923664122137406e-05,
2992
+ "loss": 1.6047,
2993
+ "mean_token_accuracy": 0.6932125955820083,
2994
+ "num_tokens": 3543418.0,
2995
+ "step": 3150
2996
+ },
2997
+ {
2998
+ "epoch": 2.8088888888888888,
2999
+ "grad_norm": 2.12741756439209,
3000
+ "learning_rate": 1.3312977099236642e-05,
3001
+ "loss": 1.6336,
3002
+ "mean_token_accuracy": 0.6868860185146332,
3003
+ "num_tokens": 3555172.0,
3004
+ "step": 3160
3005
+ },
3006
+ {
3007
+ "epoch": 2.8177777777777777,
3008
+ "grad_norm": 1.9473916292190552,
3009
+ "learning_rate": 1.270229007633588e-05,
3010
+ "loss": 1.5765,
3011
+ "mean_token_accuracy": 0.696508777141571,
3012
+ "num_tokens": 3565975.0,
3013
+ "step": 3170
3014
+ },
3015
+ {
3016
+ "epoch": 2.8266666666666667,
3017
+ "grad_norm": 2.065030336380005,
3018
+ "learning_rate": 1.2091603053435115e-05,
3019
+ "loss": 1.6127,
3020
+ "mean_token_accuracy": 0.6915735498070716,
3021
+ "num_tokens": 3578154.0,
3022
+ "step": 3180
3023
+ },
3024
+ {
3025
+ "epoch": 2.8355555555555556,
3026
+ "grad_norm": 2.1202714443206787,
3027
+ "learning_rate": 1.1480916030534351e-05,
3028
+ "loss": 1.5786,
3029
+ "mean_token_accuracy": 0.702069939672947,
3030
+ "num_tokens": 3589470.0,
3031
+ "step": 3190
3032
+ },
3033
+ {
3034
+ "epoch": 2.8444444444444446,
3035
+ "grad_norm": 2.081028699874878,
3036
+ "learning_rate": 1.0870229007633589e-05,
3037
+ "loss": 1.6146,
3038
+ "mean_token_accuracy": 0.6874286815524101,
3039
+ "num_tokens": 3600489.0,
3040
+ "step": 3200
3041
+ },
3042
+ {
3043
+ "epoch": 2.8444444444444446,
3044
+ "eval_loss": 1.8764336109161377,
3045
+ "eval_mean_token_accuracy": 0.6531118412017822,
3046
+ "eval_num_tokens": 3600489.0,
3047
+ "eval_runtime": 47.0874,
3048
+ "eval_samples_per_second": 21.237,
3049
+ "eval_steps_per_second": 10.619,
3050
+ "step": 3200
3051
+ }
3052
+ ],
3053
+ "logging_steps": 10,
3054
+ "max_steps": 3375,
3055
+ "num_input_tokens_seen": 0,
3056
+ "num_train_epochs": 3,
3057
+ "save_steps": 200,
3058
+ "stateful_callbacks": {
3059
+ "TrainerControl": {
3060
+ "args": {
3061
+ "should_epoch_stop": false,
3062
+ "should_evaluate": false,
3063
+ "should_log": false,
3064
+ "should_save": true,
3065
+ "should_training_stop": false
3066
+ },
3067
+ "attributes": {}
3068
+ }
3069
+ },
3070
+ "total_flos": 1.1419054395445248e+16,
3071
+ "train_batch_size": 2,
3072
+ "trial_name": null,
3073
+ "trial_params": null
3074
+ }
checkpoint-3200/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef7dcfbb520eaa72697f1d45b91e159189ecefb58152a35a5fa5a95eb7d53aa9
3
+ size 5624
checkpoint-3200/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3375/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-3375/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "q_proj",
28
+ "v_proj",
29
+ "k_proj",
30
+ "up_proj",
31
+ "gate_proj",
32
+ "o_proj",
33
+ "down_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
checkpoint-3375/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:312d13462cd7449243d76d3d53001e607283f1375d116df2efc112756649f1f3
3
+ size 39366152
checkpoint-3375/added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
checkpoint-3375/chat_template.jinja ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
2
+ ' + message['content'] + '<|im_end|>' + '
3
+ '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
4
+ ' }}{% endif %}
checkpoint-3375/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3375/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07d60418b383317f571e91debf117825c79793f4ee64ec7b2ea978b61b64b3b4
3
+ size 14244
checkpoint-3375/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:947399a05506c2e46af067da8feaec1b314f23e1a1053d9ababad786abd55f2b
3
+ size 988
checkpoint-3375/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f746b05629bb941ef28c25523458a3f637fb70f0c6a02c362d1aa3c1c3761bd
3
+ size 1064
checkpoint-3375/special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
checkpoint-3375/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3375/tokenizer_config.json ADDED
@@ -0,0 +1,501 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "clean_up_tokenization_spaces": true,
495
+ "eos_token": "<|endofturn|>",
496
+ "extra_special_tokens": {},
497
+ "model_max_length": 1000000000000000019884624838656,
498
+ "pad_token": "<|endoftext|>",
499
+ "tokenizer_class": "GPT2Tokenizer",
500
+ "unk_token": "<|endoftext|>"
501
+ }
checkpoint-3375/trainer_state.json ADDED
@@ -0,0 +1,3227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 3200,
3
+ "best_metric": 1.8764336109161377,
4
+ "best_model_checkpoint": "/content/drive/MyDrive/hyperclova-deobfuscation-lora/checkpoint-3200",
5
+ "epoch": 3.0,
6
+ "eval_steps": 200,
7
+ "global_step": 3375,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.008888888888888889,
14
+ "grad_norm": 3.629798412322998,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 4.1483,
17
+ "mean_token_accuracy": 0.34797456339001653,
18
+ "num_tokens": 11242.0,
19
+ "step": 10
20
+ },
21
+ {
22
+ "epoch": 0.017777777777777778,
23
+ "grad_norm": 2.6125221252441406,
24
+ "learning_rate": 3.8e-05,
25
+ "loss": 3.7515,
26
+ "mean_token_accuracy": 0.4058148756623268,
27
+ "num_tokens": 22106.0,
28
+ "step": 20
29
+ },
30
+ {
31
+ "epoch": 0.02666666666666667,
32
+ "grad_norm": 2.9313137531280518,
33
+ "learning_rate": 5.8e-05,
34
+ "loss": 3.3279,
35
+ "mean_token_accuracy": 0.4703808955848217,
36
+ "num_tokens": 33774.0,
37
+ "step": 30
38
+ },
39
+ {
40
+ "epoch": 0.035555555555555556,
41
+ "grad_norm": 2.0496416091918945,
42
+ "learning_rate": 7.800000000000001e-05,
43
+ "loss": 2.9114,
44
+ "mean_token_accuracy": 0.5239812344312668,
45
+ "num_tokens": 44943.0,
46
+ "step": 40
47
+ },
48
+ {
49
+ "epoch": 0.044444444444444446,
50
+ "grad_norm": 2.282668352127075,
51
+ "learning_rate": 9.8e-05,
52
+ "loss": 2.8468,
53
+ "mean_token_accuracy": 0.534189497679472,
54
+ "num_tokens": 56341.0,
55
+ "step": 50
56
+ },
57
+ {
58
+ "epoch": 0.05333333333333334,
59
+ "grad_norm": 2.168651819229126,
60
+ "learning_rate": 0.000118,
61
+ "loss": 2.7785,
62
+ "mean_token_accuracy": 0.5407359585165977,
63
+ "num_tokens": 67397.0,
64
+ "step": 60
65
+ },
66
+ {
67
+ "epoch": 0.06222222222222222,
68
+ "grad_norm": 2.289881467819214,
69
+ "learning_rate": 0.000138,
70
+ "loss": 2.736,
71
+ "mean_token_accuracy": 0.5326176360249519,
72
+ "num_tokens": 78482.0,
73
+ "step": 70
74
+ },
75
+ {
76
+ "epoch": 0.07111111111111111,
77
+ "grad_norm": 2.1038105487823486,
78
+ "learning_rate": 0.00015800000000000002,
79
+ "loss": 2.5855,
80
+ "mean_token_accuracy": 0.5618595249950886,
81
+ "num_tokens": 89803.0,
82
+ "step": 80
83
+ },
84
+ {
85
+ "epoch": 0.08,
86
+ "grad_norm": 2.24312686920166,
87
+ "learning_rate": 0.00017800000000000002,
88
+ "loss": 2.5365,
89
+ "mean_token_accuracy": 0.5661972932517528,
90
+ "num_tokens": 101015.0,
91
+ "step": 90
92
+ },
93
+ {
94
+ "epoch": 0.08888888888888889,
95
+ "grad_norm": 1.9482938051223755,
96
+ "learning_rate": 0.00019800000000000002,
97
+ "loss": 2.5634,
98
+ "mean_token_accuracy": 0.5538406319916248,
99
+ "num_tokens": 112364.0,
100
+ "step": 100
101
+ },
102
+ {
103
+ "epoch": 0.09777777777777778,
104
+ "grad_norm": 1.86210298538208,
105
+ "learning_rate": 0.00019945038167938932,
106
+ "loss": 2.4629,
107
+ "mean_token_accuracy": 0.5780388668179512,
108
+ "num_tokens": 122882.0,
109
+ "step": 110
110
+ },
111
+ {
112
+ "epoch": 0.10666666666666667,
113
+ "grad_norm": 1.8806918859481812,
114
+ "learning_rate": 0.00019883969465648855,
115
+ "loss": 2.5022,
116
+ "mean_token_accuracy": 0.563551553338766,
117
+ "num_tokens": 134028.0,
118
+ "step": 120
119
+ },
120
+ {
121
+ "epoch": 0.11555555555555555,
122
+ "grad_norm": 2.3264434337615967,
123
+ "learning_rate": 0.00019829007633587786,
124
+ "loss": 2.4065,
125
+ "mean_token_accuracy": 0.5807355619966984,
126
+ "num_tokens": 145192.0,
127
+ "step": 130
128
+ },
129
+ {
130
+ "epoch": 0.12444444444444444,
131
+ "grad_norm": 1.8537976741790771,
132
+ "learning_rate": 0.00019767938931297712,
133
+ "loss": 2.4838,
134
+ "mean_token_accuracy": 0.566282794624567,
135
+ "num_tokens": 156703.0,
136
+ "step": 140
137
+ },
138
+ {
139
+ "epoch": 0.13333333333333333,
140
+ "grad_norm": 2.0960652828216553,
141
+ "learning_rate": 0.00019706870229007636,
142
+ "loss": 2.4119,
143
+ "mean_token_accuracy": 0.5830203481018543,
144
+ "num_tokens": 168041.0,
145
+ "step": 150
146
+ },
147
+ {
148
+ "epoch": 0.14222222222222222,
149
+ "grad_norm": 2.2244813442230225,
150
+ "learning_rate": 0.00019645801526717557,
151
+ "loss": 2.3726,
152
+ "mean_token_accuracy": 0.5844443172216416,
153
+ "num_tokens": 178986.0,
154
+ "step": 160
155
+ },
156
+ {
157
+ "epoch": 0.1511111111111111,
158
+ "grad_norm": 1.8238722085952759,
159
+ "learning_rate": 0.0001958473282442748,
160
+ "loss": 2.4419,
161
+ "mean_token_accuracy": 0.5708602093160152,
162
+ "num_tokens": 190391.0,
163
+ "step": 170
164
+ },
165
+ {
166
+ "epoch": 0.16,
167
+ "grad_norm": 1.7154136896133423,
168
+ "learning_rate": 0.00019523664122137407,
169
+ "loss": 2.4293,
170
+ "mean_token_accuracy": 0.5748118035495281,
171
+ "num_tokens": 201989.0,
172
+ "step": 180
173
+ },
174
+ {
175
+ "epoch": 0.1688888888888889,
176
+ "grad_norm": 1.7582788467407227,
177
+ "learning_rate": 0.0001946259541984733,
178
+ "loss": 2.3577,
179
+ "mean_token_accuracy": 0.5877166777849198,
180
+ "num_tokens": 212914.0,
181
+ "step": 190
182
+ },
183
+ {
184
+ "epoch": 0.17777777777777778,
185
+ "grad_norm": 1.8613263368606567,
186
+ "learning_rate": 0.0001940152671755725,
187
+ "loss": 2.3486,
188
+ "mean_token_accuracy": 0.5889834299683571,
189
+ "num_tokens": 223936.0,
190
+ "step": 200
191
+ },
192
+ {
193
+ "epoch": 0.17777777777777778,
194
+ "eval_loss": 2.3320820331573486,
195
+ "eval_mean_token_accuracy": 0.5868698905706405,
196
+ "eval_num_tokens": 223936.0,
197
+ "eval_runtime": 49.2429,
198
+ "eval_samples_per_second": 20.307,
199
+ "eval_steps_per_second": 10.154,
200
+ "step": 200
201
+ },
202
+ {
203
+ "epoch": 0.18666666666666668,
204
+ "grad_norm": 1.8486477136611938,
205
+ "learning_rate": 0.00019340458015267175,
206
+ "loss": 2.3666,
207
+ "mean_token_accuracy": 0.5847611322999,
208
+ "num_tokens": 235036.0,
209
+ "step": 210
210
+ },
211
+ {
212
+ "epoch": 0.19555555555555557,
213
+ "grad_norm": 2.018049478530884,
214
+ "learning_rate": 0.000192793893129771,
215
+ "loss": 2.2689,
216
+ "mean_token_accuracy": 0.59971177354455,
217
+ "num_tokens": 246101.0,
218
+ "step": 220
219
+ },
220
+ {
221
+ "epoch": 0.20444444444444446,
222
+ "grad_norm": 1.7244890928268433,
223
+ "learning_rate": 0.00019218320610687024,
224
+ "loss": 2.3262,
225
+ "mean_token_accuracy": 0.5855986528098583,
226
+ "num_tokens": 257953.0,
227
+ "step": 230
228
+ },
229
+ {
230
+ "epoch": 0.21333333333333335,
231
+ "grad_norm": 1.8928934335708618,
232
+ "learning_rate": 0.00019157251908396948,
233
+ "loss": 2.3318,
234
+ "mean_token_accuracy": 0.5885626815259457,
235
+ "num_tokens": 269187.0,
236
+ "step": 240
237
+ },
238
+ {
239
+ "epoch": 0.2222222222222222,
240
+ "grad_norm": 1.7358920574188232,
241
+ "learning_rate": 0.0001909618320610687,
242
+ "loss": 2.2145,
243
+ "mean_token_accuracy": 0.6092555984854698,
244
+ "num_tokens": 279762.0,
245
+ "step": 250
246
+ },
247
+ {
248
+ "epoch": 0.2311111111111111,
249
+ "grad_norm": 1.6779032945632935,
250
+ "learning_rate": 0.00019035114503816795,
251
+ "loss": 2.3152,
252
+ "mean_token_accuracy": 0.584602715075016,
253
+ "num_tokens": 291454.0,
254
+ "step": 260
255
+ },
256
+ {
257
+ "epoch": 0.24,
258
+ "grad_norm": 1.6310207843780518,
259
+ "learning_rate": 0.0001897404580152672,
260
+ "loss": 2.2669,
261
+ "mean_token_accuracy": 0.5965895019471645,
262
+ "num_tokens": 302969.0,
263
+ "step": 270
264
+ },
265
+ {
266
+ "epoch": 0.24888888888888888,
267
+ "grad_norm": 1.6765615940093994,
268
+ "learning_rate": 0.00018912977099236642,
269
+ "loss": 2.269,
270
+ "mean_token_accuracy": 0.5934441670775413,
271
+ "num_tokens": 314204.0,
272
+ "step": 280
273
+ },
274
+ {
275
+ "epoch": 0.2577777777777778,
276
+ "grad_norm": 1.793959617614746,
277
+ "learning_rate": 0.00018851908396946566,
278
+ "loss": 2.2554,
279
+ "mean_token_accuracy": 0.600947193801403,
280
+ "num_tokens": 325649.0,
281
+ "step": 290
282
+ },
283
+ {
284
+ "epoch": 0.26666666666666666,
285
+ "grad_norm": 1.7492129802703857,
286
+ "learning_rate": 0.0001879083969465649,
287
+ "loss": 2.2157,
288
+ "mean_token_accuracy": 0.6022505328059197,
289
+ "num_tokens": 337167.0,
290
+ "step": 300
291
+ },
292
+ {
293
+ "epoch": 0.27555555555555555,
294
+ "grad_norm": 1.803576946258545,
295
+ "learning_rate": 0.00018729770992366413,
296
+ "loss": 2.2854,
297
+ "mean_token_accuracy": 0.5923042424023152,
298
+ "num_tokens": 348621.0,
299
+ "step": 310
300
+ },
301
+ {
302
+ "epoch": 0.28444444444444444,
303
+ "grad_norm": 1.9662351608276367,
304
+ "learning_rate": 0.00018668702290076337,
305
+ "loss": 2.2639,
306
+ "mean_token_accuracy": 0.588193366676569,
307
+ "num_tokens": 360272.0,
308
+ "step": 320
309
+ },
310
+ {
311
+ "epoch": 0.29333333333333333,
312
+ "grad_norm": 1.6725891828536987,
313
+ "learning_rate": 0.0001860763358778626,
314
+ "loss": 2.2249,
315
+ "mean_token_accuracy": 0.6054098337888718,
316
+ "num_tokens": 371346.0,
317
+ "step": 330
318
+ },
319
+ {
320
+ "epoch": 0.3022222222222222,
321
+ "grad_norm": 1.68416166305542,
322
+ "learning_rate": 0.00018546564885496184,
323
+ "loss": 2.1678,
324
+ "mean_token_accuracy": 0.6146526508033275,
325
+ "num_tokens": 382779.0,
326
+ "step": 340
327
+ },
328
+ {
329
+ "epoch": 0.3111111111111111,
330
+ "grad_norm": 1.7218507528305054,
331
+ "learning_rate": 0.00018485496183206108,
332
+ "loss": 2.2011,
333
+ "mean_token_accuracy": 0.6104303196072578,
334
+ "num_tokens": 393823.0,
335
+ "step": 350
336
+ },
337
+ {
338
+ "epoch": 0.32,
339
+ "grad_norm": 1.6817256212234497,
340
+ "learning_rate": 0.0001842442748091603,
341
+ "loss": 2.2264,
342
+ "mean_token_accuracy": 0.5987282857298851,
343
+ "num_tokens": 405438.0,
344
+ "step": 360
345
+ },
346
+ {
347
+ "epoch": 0.3288888888888889,
348
+ "grad_norm": 1.7454718351364136,
349
+ "learning_rate": 0.00018363358778625955,
350
+ "loss": 2.2712,
351
+ "mean_token_accuracy": 0.5939777493476868,
352
+ "num_tokens": 417299.0,
353
+ "step": 370
354
+ },
355
+ {
356
+ "epoch": 0.3377777777777778,
357
+ "grad_norm": 2.011315107345581,
358
+ "learning_rate": 0.00018302290076335878,
359
+ "loss": 2.2247,
360
+ "mean_token_accuracy": 0.6061037018895149,
361
+ "num_tokens": 428660.0,
362
+ "step": 380
363
+ },
364
+ {
365
+ "epoch": 0.3466666666666667,
366
+ "grad_norm": 1.6242053508758545,
367
+ "learning_rate": 0.00018241221374045802,
368
+ "loss": 2.232,
369
+ "mean_token_accuracy": 0.6062197655439376,
370
+ "num_tokens": 439768.0,
371
+ "step": 390
372
+ },
373
+ {
374
+ "epoch": 0.35555555555555557,
375
+ "grad_norm": 1.9328559637069702,
376
+ "learning_rate": 0.00018180152671755725,
377
+ "loss": 2.1291,
378
+ "mean_token_accuracy": 0.6168317429721355,
379
+ "num_tokens": 450808.0,
380
+ "step": 400
381
+ },
382
+ {
383
+ "epoch": 0.35555555555555557,
384
+ "eval_loss": 2.1662538051605225,
385
+ "eval_mean_token_accuracy": 0.6099509916305542,
386
+ "eval_num_tokens": 450808.0,
387
+ "eval_runtime": 49.4213,
388
+ "eval_samples_per_second": 20.234,
389
+ "eval_steps_per_second": 10.117,
390
+ "step": 400
391
+ },
392
+ {
393
+ "epoch": 0.36444444444444446,
394
+ "grad_norm": 1.8797143697738647,
395
+ "learning_rate": 0.0001811908396946565,
396
+ "loss": 2.2086,
397
+ "mean_token_accuracy": 0.6012695133686066,
398
+ "num_tokens": 461592.0,
399
+ "step": 410
400
+ },
401
+ {
402
+ "epoch": 0.37333333333333335,
403
+ "grad_norm": 1.7558225393295288,
404
+ "learning_rate": 0.00018058015267175575,
405
+ "loss": 2.1771,
406
+ "mean_token_accuracy": 0.6060668036341668,
407
+ "num_tokens": 473434.0,
408
+ "step": 420
409
+ },
410
+ {
411
+ "epoch": 0.38222222222222224,
412
+ "grad_norm": 1.845051884651184,
413
+ "learning_rate": 0.00017996946564885496,
414
+ "loss": 2.2576,
415
+ "mean_token_accuracy": 0.5929104581475257,
416
+ "num_tokens": 485130.0,
417
+ "step": 430
418
+ },
419
+ {
420
+ "epoch": 0.39111111111111113,
421
+ "grad_norm": 1.6992298364639282,
422
+ "learning_rate": 0.0001793587786259542,
423
+ "loss": 2.1815,
424
+ "mean_token_accuracy": 0.6100690707564353,
425
+ "num_tokens": 496482.0,
426
+ "step": 440
427
+ },
428
+ {
429
+ "epoch": 0.4,
430
+ "grad_norm": 1.7239253520965576,
431
+ "learning_rate": 0.00017874809160305343,
432
+ "loss": 2.2082,
433
+ "mean_token_accuracy": 0.6001435503363609,
434
+ "num_tokens": 508218.0,
435
+ "step": 450
436
+ },
437
+ {
438
+ "epoch": 0.4088888888888889,
439
+ "grad_norm": 1.7856336832046509,
440
+ "learning_rate": 0.0001781374045801527,
441
+ "loss": 2.1593,
442
+ "mean_token_accuracy": 0.6118309393525123,
443
+ "num_tokens": 519379.0,
444
+ "step": 460
445
+ },
446
+ {
447
+ "epoch": 0.4177777777777778,
448
+ "grad_norm": 1.611831545829773,
449
+ "learning_rate": 0.00017752671755725193,
450
+ "loss": 2.1797,
451
+ "mean_token_accuracy": 0.6033190444111824,
452
+ "num_tokens": 530561.0,
453
+ "step": 470
454
+ },
455
+ {
456
+ "epoch": 0.4266666666666667,
457
+ "grad_norm": 1.7420586347579956,
458
+ "learning_rate": 0.00017691603053435114,
459
+ "loss": 2.2027,
460
+ "mean_token_accuracy": 0.6067790001630783,
461
+ "num_tokens": 542631.0,
462
+ "step": 480
463
+ },
464
+ {
465
+ "epoch": 0.43555555555555553,
466
+ "grad_norm": 1.948723316192627,
467
+ "learning_rate": 0.00017630534351145038,
468
+ "loss": 2.1753,
469
+ "mean_token_accuracy": 0.6109650492668152,
470
+ "num_tokens": 553477.0,
471
+ "step": 490
472
+ },
473
+ {
474
+ "epoch": 0.4444444444444444,
475
+ "grad_norm": 1.7983819246292114,
476
+ "learning_rate": 0.00017569465648854964,
477
+ "loss": 2.158,
478
+ "mean_token_accuracy": 0.5996212616562844,
479
+ "num_tokens": 565400.0,
480
+ "step": 500
481
+ },
482
+ {
483
+ "epoch": 0.4533333333333333,
484
+ "grad_norm": 1.842372179031372,
485
+ "learning_rate": 0.00017508396946564888,
486
+ "loss": 2.0825,
487
+ "mean_token_accuracy": 0.6168116196990013,
488
+ "num_tokens": 576953.0,
489
+ "step": 510
490
+ },
491
+ {
492
+ "epoch": 0.4622222222222222,
493
+ "grad_norm": 1.91799795627594,
494
+ "learning_rate": 0.00017447328244274809,
495
+ "loss": 2.1022,
496
+ "mean_token_accuracy": 0.6168905258178711,
497
+ "num_tokens": 588003.0,
498
+ "step": 520
499
+ },
500
+ {
501
+ "epoch": 0.4711111111111111,
502
+ "grad_norm": 1.7727124691009521,
503
+ "learning_rate": 0.00017386259541984732,
504
+ "loss": 2.1695,
505
+ "mean_token_accuracy": 0.5997609972953797,
506
+ "num_tokens": 600043.0,
507
+ "step": 530
508
+ },
509
+ {
510
+ "epoch": 0.48,
511
+ "grad_norm": 1.8602296113967896,
512
+ "learning_rate": 0.00017325190839694658,
513
+ "loss": 2.0849,
514
+ "mean_token_accuracy": 0.6266478568315506,
515
+ "num_tokens": 610974.0,
516
+ "step": 540
517
+ },
518
+ {
519
+ "epoch": 0.4888888888888889,
520
+ "grad_norm": 1.545620083808899,
521
+ "learning_rate": 0.00017264122137404582,
522
+ "loss": 2.1824,
523
+ "mean_token_accuracy": 0.6072694823145867,
524
+ "num_tokens": 622632.0,
525
+ "step": 550
526
+ },
527
+ {
528
+ "epoch": 0.49777777777777776,
529
+ "grad_norm": 1.7485988140106201,
530
+ "learning_rate": 0.00017203053435114506,
531
+ "loss": 2.1374,
532
+ "mean_token_accuracy": 0.6164417043328285,
533
+ "num_tokens": 634093.0,
534
+ "step": 560
535
+ },
536
+ {
537
+ "epoch": 0.5066666666666667,
538
+ "grad_norm": 1.8591196537017822,
539
+ "learning_rate": 0.00017141984732824426,
540
+ "loss": 2.0928,
541
+ "mean_token_accuracy": 0.6241554819047451,
542
+ "num_tokens": 645226.0,
543
+ "step": 570
544
+ },
545
+ {
546
+ "epoch": 0.5155555555555555,
547
+ "grad_norm": 1.8163517713546753,
548
+ "learning_rate": 0.00017080916030534353,
549
+ "loss": 2.0476,
550
+ "mean_token_accuracy": 0.6285594403743744,
551
+ "num_tokens": 656188.0,
552
+ "step": 580
553
+ },
554
+ {
555
+ "epoch": 0.5244444444444445,
556
+ "grad_norm": 1.7729696035385132,
557
+ "learning_rate": 0.00017019847328244276,
558
+ "loss": 2.1036,
559
+ "mean_token_accuracy": 0.6208315283060074,
560
+ "num_tokens": 667642.0,
561
+ "step": 590
562
+ },
563
+ {
564
+ "epoch": 0.5333333333333333,
565
+ "grad_norm": 1.7804032564163208,
566
+ "learning_rate": 0.000169587786259542,
567
+ "loss": 2.1174,
568
+ "mean_token_accuracy": 0.6148250237107277,
569
+ "num_tokens": 678769.0,
570
+ "step": 600
571
+ },
572
+ {
573
+ "epoch": 0.5333333333333333,
574
+ "eval_loss": 2.0850696563720703,
575
+ "eval_mean_token_accuracy": 0.6197466601729393,
576
+ "eval_num_tokens": 678769.0,
577
+ "eval_runtime": 49.7611,
578
+ "eval_samples_per_second": 20.096,
579
+ "eval_steps_per_second": 10.048,
580
+ "step": 600
581
+ },
582
+ {
583
+ "epoch": 0.5422222222222223,
584
+ "grad_norm": 1.8643274307250977,
585
+ "learning_rate": 0.00016897709923664124,
586
+ "loss": 2.0485,
587
+ "mean_token_accuracy": 0.6331146821379662,
588
+ "num_tokens": 690014.0,
589
+ "step": 610
590
+ },
591
+ {
592
+ "epoch": 0.5511111111111111,
593
+ "grad_norm": 1.8060939311981201,
594
+ "learning_rate": 0.00016836641221374047,
595
+ "loss": 2.1117,
596
+ "mean_token_accuracy": 0.612041813135147,
597
+ "num_tokens": 701734.0,
598
+ "step": 620
599
+ },
600
+ {
601
+ "epoch": 0.56,
602
+ "grad_norm": 1.7059085369110107,
603
+ "learning_rate": 0.0001677557251908397,
604
+ "loss": 2.0747,
605
+ "mean_token_accuracy": 0.6174572542309761,
606
+ "num_tokens": 713570.0,
607
+ "step": 630
608
+ },
609
+ {
610
+ "epoch": 0.5688888888888889,
611
+ "grad_norm": 1.6600592136383057,
612
+ "learning_rate": 0.00016714503816793894,
613
+ "loss": 2.0685,
614
+ "mean_token_accuracy": 0.6293445661664009,
615
+ "num_tokens": 724815.0,
616
+ "step": 640
617
+ },
618
+ {
619
+ "epoch": 0.5777777777777777,
620
+ "grad_norm": 1.6598913669586182,
621
+ "learning_rate": 0.00016653435114503818,
622
+ "loss": 2.0255,
623
+ "mean_token_accuracy": 0.6309839904308319,
624
+ "num_tokens": 735777.0,
625
+ "step": 650
626
+ },
627
+ {
628
+ "epoch": 0.5866666666666667,
629
+ "grad_norm": 1.8306963443756104,
630
+ "learning_rate": 0.00016592366412213741,
631
+ "loss": 2.1249,
632
+ "mean_token_accuracy": 0.6147443532943726,
633
+ "num_tokens": 746903.0,
634
+ "step": 660
635
+ },
636
+ {
637
+ "epoch": 0.5955555555555555,
638
+ "grad_norm": 1.626795768737793,
639
+ "learning_rate": 0.00016531297709923665,
640
+ "loss": 2.0694,
641
+ "mean_token_accuracy": 0.6254988595843315,
642
+ "num_tokens": 757881.0,
643
+ "step": 670
644
+ },
645
+ {
646
+ "epoch": 0.6044444444444445,
647
+ "grad_norm": 1.710806131362915,
648
+ "learning_rate": 0.00016470229007633589,
649
+ "loss": 2.0397,
650
+ "mean_token_accuracy": 0.6233279958367348,
651
+ "num_tokens": 768982.0,
652
+ "step": 680
653
+ },
654
+ {
655
+ "epoch": 0.6133333333333333,
656
+ "grad_norm": 1.7051280736923218,
657
+ "learning_rate": 0.00016409160305343512,
658
+ "loss": 2.116,
659
+ "mean_token_accuracy": 0.6183760315179825,
660
+ "num_tokens": 780072.0,
661
+ "step": 690
662
+ },
663
+ {
664
+ "epoch": 0.6222222222222222,
665
+ "grad_norm": 1.607917070388794,
666
+ "learning_rate": 0.00016348091603053436,
667
+ "loss": 2.0478,
668
+ "mean_token_accuracy": 0.6331974640488625,
669
+ "num_tokens": 791061.0,
670
+ "step": 700
671
+ },
672
+ {
673
+ "epoch": 0.6311111111111111,
674
+ "grad_norm": 1.7803592681884766,
675
+ "learning_rate": 0.0001628702290076336,
676
+ "loss": 2.0595,
677
+ "mean_token_accuracy": 0.6249041527509689,
678
+ "num_tokens": 801867.0,
679
+ "step": 710
680
+ },
681
+ {
682
+ "epoch": 0.64,
683
+ "grad_norm": 1.6132373809814453,
684
+ "learning_rate": 0.00016225954198473283,
685
+ "loss": 2.0789,
686
+ "mean_token_accuracy": 0.6235784366726875,
687
+ "num_tokens": 813112.0,
688
+ "step": 720
689
+ },
690
+ {
691
+ "epoch": 0.6488888888888888,
692
+ "grad_norm": 1.790528655052185,
693
+ "learning_rate": 0.00016164885496183207,
694
+ "loss": 2.0632,
695
+ "mean_token_accuracy": 0.6268924325704575,
696
+ "num_tokens": 824133.0,
697
+ "step": 730
698
+ },
699
+ {
700
+ "epoch": 0.6577777777777778,
701
+ "grad_norm": 2.0007362365722656,
702
+ "learning_rate": 0.0001610381679389313,
703
+ "loss": 2.0701,
704
+ "mean_token_accuracy": 0.6189413338899612,
705
+ "num_tokens": 835469.0,
706
+ "step": 740
707
+ },
708
+ {
709
+ "epoch": 0.6666666666666666,
710
+ "grad_norm": 2.227158546447754,
711
+ "learning_rate": 0.00016042748091603054,
712
+ "loss": 2.0339,
713
+ "mean_token_accuracy": 0.621903920173645,
714
+ "num_tokens": 846572.0,
715
+ "step": 750
716
+ },
717
+ {
718
+ "epoch": 0.6755555555555556,
719
+ "grad_norm": 1.80472731590271,
720
+ "learning_rate": 0.00015981679389312977,
721
+ "loss": 2.1285,
722
+ "mean_token_accuracy": 0.604806374013424,
723
+ "num_tokens": 857795.0,
724
+ "step": 760
725
+ },
726
+ {
727
+ "epoch": 0.6844444444444444,
728
+ "grad_norm": 1.7893937826156616,
729
+ "learning_rate": 0.000159206106870229,
730
+ "loss": 2.0347,
731
+ "mean_token_accuracy": 0.6292635962367058,
732
+ "num_tokens": 868429.0,
733
+ "step": 770
734
+ },
735
+ {
736
+ "epoch": 0.6933333333333334,
737
+ "grad_norm": 1.6761573553085327,
738
+ "learning_rate": 0.00015859541984732824,
739
+ "loss": 2.0591,
740
+ "mean_token_accuracy": 0.6254431992769242,
741
+ "num_tokens": 879659.0,
742
+ "step": 780
743
+ },
744
+ {
745
+ "epoch": 0.7022222222222222,
746
+ "grad_norm": 1.803045630455017,
747
+ "learning_rate": 0.0001579847328244275,
748
+ "loss": 2.0293,
749
+ "mean_token_accuracy": 0.6273573949933052,
750
+ "num_tokens": 890911.0,
751
+ "step": 790
752
+ },
753
+ {
754
+ "epoch": 0.7111111111111111,
755
+ "grad_norm": 1.7385220527648926,
756
+ "learning_rate": 0.00015737404580152672,
757
+ "loss": 2.0197,
758
+ "mean_token_accuracy": 0.63025072067976,
759
+ "num_tokens": 902240.0,
760
+ "step": 800
761
+ },
762
+ {
763
+ "epoch": 0.7111111111111111,
764
+ "eval_loss": 2.0297935009002686,
765
+ "eval_mean_token_accuracy": 0.628437293112278,
766
+ "eval_num_tokens": 902240.0,
767
+ "eval_runtime": 49.3011,
768
+ "eval_samples_per_second": 20.284,
769
+ "eval_steps_per_second": 10.142,
770
+ "step": 800
771
+ },
772
+ {
773
+ "epoch": 0.72,
774
+ "grad_norm": 1.8906656503677368,
775
+ "learning_rate": 0.00015676335877862595,
776
+ "loss": 2.0806,
777
+ "mean_token_accuracy": 0.619849094748497,
778
+ "num_tokens": 914009.0,
779
+ "step": 810
780
+ },
781
+ {
782
+ "epoch": 0.7288888888888889,
783
+ "grad_norm": 1.714268684387207,
784
+ "learning_rate": 0.0001561526717557252,
785
+ "loss": 2.0343,
786
+ "mean_token_accuracy": 0.632188580930233,
787
+ "num_tokens": 925091.0,
788
+ "step": 820
789
+ },
790
+ {
791
+ "epoch": 0.7377777777777778,
792
+ "grad_norm": 1.833918809890747,
793
+ "learning_rate": 0.00015554198473282445,
794
+ "loss": 2.0747,
795
+ "mean_token_accuracy": 0.6280180156230927,
796
+ "num_tokens": 936675.0,
797
+ "step": 830
798
+ },
799
+ {
800
+ "epoch": 0.7466666666666667,
801
+ "grad_norm": 1.9817575216293335,
802
+ "learning_rate": 0.00015493129770992366,
803
+ "loss": 2.0859,
804
+ "mean_token_accuracy": 0.6128378361463547,
805
+ "num_tokens": 948151.0,
806
+ "step": 840
807
+ },
808
+ {
809
+ "epoch": 0.7555555555555555,
810
+ "grad_norm": 1.5982656478881836,
811
+ "learning_rate": 0.0001543206106870229,
812
+ "loss": 2.0455,
813
+ "mean_token_accuracy": 0.6276382938027382,
814
+ "num_tokens": 959266.0,
815
+ "step": 850
816
+ },
817
+ {
818
+ "epoch": 0.7644444444444445,
819
+ "grad_norm": 1.7298970222473145,
820
+ "learning_rate": 0.00015370992366412213,
821
+ "loss": 1.9604,
822
+ "mean_token_accuracy": 0.6377590849995614,
823
+ "num_tokens": 970339.0,
824
+ "step": 860
825
+ },
826
+ {
827
+ "epoch": 0.7733333333333333,
828
+ "grad_norm": 1.8064581155776978,
829
+ "learning_rate": 0.0001530992366412214,
830
+ "loss": 2.0698,
831
+ "mean_token_accuracy": 0.6194617792963981,
832
+ "num_tokens": 981805.0,
833
+ "step": 870
834
+ },
835
+ {
836
+ "epoch": 0.7822222222222223,
837
+ "grad_norm": 1.5860410928726196,
838
+ "learning_rate": 0.00015248854961832063,
839
+ "loss": 2.0182,
840
+ "mean_token_accuracy": 0.6292306095361709,
841
+ "num_tokens": 993552.0,
842
+ "step": 880
843
+ },
844
+ {
845
+ "epoch": 0.7911111111111111,
846
+ "grad_norm": 1.8761259317398071,
847
+ "learning_rate": 0.00015187786259541984,
848
+ "loss": 2.0335,
849
+ "mean_token_accuracy": 0.6285651385784149,
850
+ "num_tokens": 1004400.0,
851
+ "step": 890
852
+ },
853
+ {
854
+ "epoch": 0.8,
855
+ "grad_norm": 1.6973590850830078,
856
+ "learning_rate": 0.00015126717557251908,
857
+ "loss": 2.0927,
858
+ "mean_token_accuracy": 0.6183614790439605,
859
+ "num_tokens": 1015564.0,
860
+ "step": 900
861
+ },
862
+ {
863
+ "epoch": 0.8088888888888889,
864
+ "grad_norm": 1.6477675437927246,
865
+ "learning_rate": 0.00015065648854961834,
866
+ "loss": 1.9187,
867
+ "mean_token_accuracy": 0.6427812784910202,
868
+ "num_tokens": 1026849.0,
869
+ "step": 910
870
+ },
871
+ {
872
+ "epoch": 0.8177777777777778,
873
+ "grad_norm": 1.6942589282989502,
874
+ "learning_rate": 0.00015004580152671757,
875
+ "loss": 2.0139,
876
+ "mean_token_accuracy": 0.6322552219033242,
877
+ "num_tokens": 1037721.0,
878
+ "step": 920
879
+ },
880
+ {
881
+ "epoch": 0.8266666666666667,
882
+ "grad_norm": 1.6394822597503662,
883
+ "learning_rate": 0.0001494351145038168,
884
+ "loss": 2.0392,
885
+ "mean_token_accuracy": 0.6273665294051171,
886
+ "num_tokens": 1048986.0,
887
+ "step": 930
888
+ },
889
+ {
890
+ "epoch": 0.8355555555555556,
891
+ "grad_norm": 1.697804570198059,
892
+ "learning_rate": 0.00014882442748091602,
893
+ "loss": 2.0412,
894
+ "mean_token_accuracy": 0.625536386668682,
895
+ "num_tokens": 1060627.0,
896
+ "step": 940
897
+ },
898
+ {
899
+ "epoch": 0.8444444444444444,
900
+ "grad_norm": 1.8058092594146729,
901
+ "learning_rate": 0.00014821374045801528,
902
+ "loss": 1.9737,
903
+ "mean_token_accuracy": 0.6332821652293206,
904
+ "num_tokens": 1071482.0,
905
+ "step": 950
906
+ },
907
+ {
908
+ "epoch": 0.8533333333333334,
909
+ "grad_norm": 1.773294448852539,
910
+ "learning_rate": 0.00014760305343511452,
911
+ "loss": 2.054,
912
+ "mean_token_accuracy": 0.6256278708577157,
913
+ "num_tokens": 1082672.0,
914
+ "step": 960
915
+ },
916
+ {
917
+ "epoch": 0.8622222222222222,
918
+ "grad_norm": 1.6936707496643066,
919
+ "learning_rate": 0.00014699236641221375,
920
+ "loss": 1.9957,
921
+ "mean_token_accuracy": 0.6333451583981514,
922
+ "num_tokens": 1093493.0,
923
+ "step": 970
924
+ },
925
+ {
926
+ "epoch": 0.8711111111111111,
927
+ "grad_norm": 1.7029008865356445,
928
+ "learning_rate": 0.000146381679389313,
929
+ "loss": 2.0526,
930
+ "mean_token_accuracy": 0.6244132176041604,
931
+ "num_tokens": 1104857.0,
932
+ "step": 980
933
+ },
934
+ {
935
+ "epoch": 0.88,
936
+ "grad_norm": 1.8421082496643066,
937
+ "learning_rate": 0.00014577099236641223,
938
+ "loss": 2.0311,
939
+ "mean_token_accuracy": 0.6236826583743096,
940
+ "num_tokens": 1116131.0,
941
+ "step": 990
942
+ },
943
+ {
944
+ "epoch": 0.8888888888888888,
945
+ "grad_norm": 1.646053433418274,
946
+ "learning_rate": 0.00014516030534351146,
947
+ "loss": 1.9973,
948
+ "mean_token_accuracy": 0.6274659112095833,
949
+ "num_tokens": 1127612.0,
950
+ "step": 1000
951
+ },
952
+ {
953
+ "epoch": 0.8888888888888888,
954
+ "eval_loss": 1.989682674407959,
955
+ "eval_mean_token_accuracy": 0.633990108013153,
956
+ "eval_num_tokens": 1127612.0,
957
+ "eval_runtime": 49.3043,
958
+ "eval_samples_per_second": 20.282,
959
+ "eval_steps_per_second": 10.141,
960
+ "step": 1000
961
+ },
962
+ {
963
+ "epoch": 0.8977777777777778,
964
+ "grad_norm": 1.5941271781921387,
965
+ "learning_rate": 0.0001445496183206107,
966
+ "loss": 2.0579,
967
+ "mean_token_accuracy": 0.6256210282444954,
968
+ "num_tokens": 1138866.0,
969
+ "step": 1010
970
+ },
971
+ {
972
+ "epoch": 0.9066666666666666,
973
+ "grad_norm": 1.7826253175735474,
974
+ "learning_rate": 0.00014393893129770993,
975
+ "loss": 1.9866,
976
+ "mean_token_accuracy": 0.6332772478461266,
977
+ "num_tokens": 1150411.0,
978
+ "step": 1020
979
+ },
980
+ {
981
+ "epoch": 0.9155555555555556,
982
+ "grad_norm": 1.8722221851348877,
983
+ "learning_rate": 0.00014332824427480917,
984
+ "loss": 2.0398,
985
+ "mean_token_accuracy": 0.627329595386982,
986
+ "num_tokens": 1161360.0,
987
+ "step": 1030
988
+ },
989
+ {
990
+ "epoch": 0.9244444444444444,
991
+ "grad_norm": 1.6533294916152954,
992
+ "learning_rate": 0.0001427175572519084,
993
+ "loss": 2.0271,
994
+ "mean_token_accuracy": 0.6259514302015304,
995
+ "num_tokens": 1172683.0,
996
+ "step": 1040
997
+ },
998
+ {
999
+ "epoch": 0.9333333333333333,
1000
+ "grad_norm": 1.5746543407440186,
1001
+ "learning_rate": 0.00014210687022900764,
1002
+ "loss": 1.9634,
1003
+ "mean_token_accuracy": 0.6359310179948807,
1004
+ "num_tokens": 1183277.0,
1005
+ "step": 1050
1006
+ },
1007
+ {
1008
+ "epoch": 0.9422222222222222,
1009
+ "grad_norm": 1.6094276905059814,
1010
+ "learning_rate": 0.00014149618320610688,
1011
+ "loss": 1.9195,
1012
+ "mean_token_accuracy": 0.649330523610115,
1013
+ "num_tokens": 1194160.0,
1014
+ "step": 1060
1015
+ },
1016
+ {
1017
+ "epoch": 0.9511111111111111,
1018
+ "grad_norm": 1.9643882513046265,
1019
+ "learning_rate": 0.0001408854961832061,
1020
+ "loss": 2.0042,
1021
+ "mean_token_accuracy": 0.6356254667043686,
1022
+ "num_tokens": 1205308.0,
1023
+ "step": 1070
1024
+ },
1025
+ {
1026
+ "epoch": 0.96,
1027
+ "grad_norm": 1.8238948583602905,
1028
+ "learning_rate": 0.00014027480916030535,
1029
+ "loss": 1.9172,
1030
+ "mean_token_accuracy": 0.6497033536434174,
1031
+ "num_tokens": 1215760.0,
1032
+ "step": 1080
1033
+ },
1034
+ {
1035
+ "epoch": 0.9688888888888889,
1036
+ "grad_norm": 1.7422380447387695,
1037
+ "learning_rate": 0.00013966412213740458,
1038
+ "loss": 2.0213,
1039
+ "mean_token_accuracy": 0.6309294819831848,
1040
+ "num_tokens": 1226775.0,
1041
+ "step": 1090
1042
+ },
1043
+ {
1044
+ "epoch": 0.9777777777777777,
1045
+ "grad_norm": 1.651795744895935,
1046
+ "learning_rate": 0.00013905343511450382,
1047
+ "loss": 2.033,
1048
+ "mean_token_accuracy": 0.6295390352606773,
1049
+ "num_tokens": 1238191.0,
1050
+ "step": 1100
1051
+ },
1052
+ {
1053
+ "epoch": 0.9866666666666667,
1054
+ "grad_norm": 1.673543095588684,
1055
+ "learning_rate": 0.00013844274809160308,
1056
+ "loss": 2.0085,
1057
+ "mean_token_accuracy": 0.6329691678285598,
1058
+ "num_tokens": 1249561.0,
1059
+ "step": 1110
1060
+ },
1061
+ {
1062
+ "epoch": 0.9955555555555555,
1063
+ "grad_norm": 1.7423163652420044,
1064
+ "learning_rate": 0.0001378320610687023,
1065
+ "loss": 1.9751,
1066
+ "mean_token_accuracy": 0.6307685926556588,
1067
+ "num_tokens": 1260429.0,
1068
+ "step": 1120
1069
+ },
1070
+ {
1071
+ "epoch": 1.0044444444444445,
1072
+ "grad_norm": 1.4878981113433838,
1073
+ "learning_rate": 0.00013722137404580153,
1074
+ "loss": 1.9171,
1075
+ "mean_token_accuracy": 0.644737622141838,
1076
+ "num_tokens": 1271111.0,
1077
+ "step": 1130
1078
+ },
1079
+ {
1080
+ "epoch": 1.0133333333333334,
1081
+ "grad_norm": 1.5343797206878662,
1082
+ "learning_rate": 0.00013661068702290076,
1083
+ "loss": 1.8544,
1084
+ "mean_token_accuracy": 0.6503374725580215,
1085
+ "num_tokens": 1282434.0,
1086
+ "step": 1140
1087
+ },
1088
+ {
1089
+ "epoch": 1.0222222222222221,
1090
+ "grad_norm": 1.5450340509414673,
1091
+ "learning_rate": 0.00013600000000000003,
1092
+ "loss": 1.828,
1093
+ "mean_token_accuracy": 0.6514182686805725,
1094
+ "num_tokens": 1294382.0,
1095
+ "step": 1150
1096
+ },
1097
+ {
1098
+ "epoch": 1.031111111111111,
1099
+ "grad_norm": 1.8313877582550049,
1100
+ "learning_rate": 0.00013538931297709923,
1101
+ "loss": 1.7704,
1102
+ "mean_token_accuracy": 0.6693721905350685,
1103
+ "num_tokens": 1305343.0,
1104
+ "step": 1160
1105
+ },
1106
+ {
1107
+ "epoch": 1.04,
1108
+ "grad_norm": 1.8418430089950562,
1109
+ "learning_rate": 0.00013477862595419847,
1110
+ "loss": 1.7591,
1111
+ "mean_token_accuracy": 0.67226582467556,
1112
+ "num_tokens": 1316558.0,
1113
+ "step": 1170
1114
+ },
1115
+ {
1116
+ "epoch": 1.048888888888889,
1117
+ "grad_norm": 1.6022825241088867,
1118
+ "learning_rate": 0.0001341679389312977,
1119
+ "loss": 1.8048,
1120
+ "mean_token_accuracy": 0.6629651457071304,
1121
+ "num_tokens": 1327938.0,
1122
+ "step": 1180
1123
+ },
1124
+ {
1125
+ "epoch": 1.0577777777777777,
1126
+ "grad_norm": 1.5888707637786865,
1127
+ "learning_rate": 0.00013355725190839697,
1128
+ "loss": 1.773,
1129
+ "mean_token_accuracy": 0.6730352655053139,
1130
+ "num_tokens": 1338732.0,
1131
+ "step": 1190
1132
+ },
1133
+ {
1134
+ "epoch": 1.0666666666666667,
1135
+ "grad_norm": 1.833946943283081,
1136
+ "learning_rate": 0.0001329465648854962,
1137
+ "loss": 1.7887,
1138
+ "mean_token_accuracy": 0.6616317644715309,
1139
+ "num_tokens": 1350096.0,
1140
+ "step": 1200
1141
+ },
1142
+ {
1143
+ "epoch": 1.0666666666666667,
1144
+ "eval_loss": 1.9697085618972778,
1145
+ "eval_mean_token_accuracy": 0.6378205664157868,
1146
+ "eval_num_tokens": 1350096.0,
1147
+ "eval_runtime": 49.9237,
1148
+ "eval_samples_per_second": 20.031,
1149
+ "eval_steps_per_second": 10.015,
1150
+ "step": 1200
1151
+ },
1152
+ {
1153
+ "epoch": 1.0755555555555556,
1154
+ "grad_norm": 1.6338160037994385,
1155
+ "learning_rate": 0.00013233587786259541,
1156
+ "loss": 1.7889,
1157
+ "mean_token_accuracy": 0.6668319672346115,
1158
+ "num_tokens": 1360771.0,
1159
+ "step": 1210
1160
+ },
1161
+ {
1162
+ "epoch": 1.0844444444444445,
1163
+ "grad_norm": 1.8737561702728271,
1164
+ "learning_rate": 0.00013172519083969465,
1165
+ "loss": 1.7997,
1166
+ "mean_token_accuracy": 0.6570939287543297,
1167
+ "num_tokens": 1372450.0,
1168
+ "step": 1220
1169
+ },
1170
+ {
1171
+ "epoch": 1.0933333333333333,
1172
+ "grad_norm": 1.758074402809143,
1173
+ "learning_rate": 0.0001311145038167939,
1174
+ "loss": 1.8457,
1175
+ "mean_token_accuracy": 0.653074924647808,
1176
+ "num_tokens": 1383711.0,
1177
+ "step": 1230
1178
+ },
1179
+ {
1180
+ "epoch": 1.1022222222222222,
1181
+ "grad_norm": 1.839158296585083,
1182
+ "learning_rate": 0.00013050381679389315,
1183
+ "loss": 1.8013,
1184
+ "mean_token_accuracy": 0.6608111187815666,
1185
+ "num_tokens": 1394856.0,
1186
+ "step": 1240
1187
+ },
1188
+ {
1189
+ "epoch": 1.1111111111111112,
1190
+ "grad_norm": 1.733567476272583,
1191
+ "learning_rate": 0.00012989312977099238,
1192
+ "loss": 1.7814,
1193
+ "mean_token_accuracy": 0.6655508041381836,
1194
+ "num_tokens": 1406193.0,
1195
+ "step": 1250
1196
+ },
1197
+ {
1198
+ "epoch": 1.12,
1199
+ "grad_norm": 1.6274900436401367,
1200
+ "learning_rate": 0.0001292824427480916,
1201
+ "loss": 1.858,
1202
+ "mean_token_accuracy": 0.6488608077168465,
1203
+ "num_tokens": 1417607.0,
1204
+ "step": 1260
1205
+ },
1206
+ {
1207
+ "epoch": 1.1288888888888888,
1208
+ "grad_norm": 1.690090537071228,
1209
+ "learning_rate": 0.00012867175572519086,
1210
+ "loss": 1.8256,
1211
+ "mean_token_accuracy": 0.6595686703920365,
1212
+ "num_tokens": 1429073.0,
1213
+ "step": 1270
1214
+ },
1215
+ {
1216
+ "epoch": 1.1377777777777778,
1217
+ "grad_norm": 1.6638071537017822,
1218
+ "learning_rate": 0.0001280610687022901,
1219
+ "loss": 1.8334,
1220
+ "mean_token_accuracy": 0.6580470725893974,
1221
+ "num_tokens": 1440194.0,
1222
+ "step": 1280
1223
+ },
1224
+ {
1225
+ "epoch": 1.1466666666666667,
1226
+ "grad_norm": 1.8339307308197021,
1227
+ "learning_rate": 0.00012745038167938933,
1228
+ "loss": 1.783,
1229
+ "mean_token_accuracy": 0.6632378786802292,
1230
+ "num_tokens": 1451221.0,
1231
+ "step": 1290
1232
+ },
1233
+ {
1234
+ "epoch": 1.1555555555555554,
1235
+ "grad_norm": 1.7621415853500366,
1236
+ "learning_rate": 0.00012683969465648854,
1237
+ "loss": 1.844,
1238
+ "mean_token_accuracy": 0.6506654173135757,
1239
+ "num_tokens": 1462493.0,
1240
+ "step": 1300
1241
+ },
1242
+ {
1243
+ "epoch": 1.1644444444444444,
1244
+ "grad_norm": 1.7811567783355713,
1245
+ "learning_rate": 0.00012622900763358777,
1246
+ "loss": 1.8235,
1247
+ "mean_token_accuracy": 0.6505810797214509,
1248
+ "num_tokens": 1473710.0,
1249
+ "step": 1310
1250
+ },
1251
+ {
1252
+ "epoch": 1.1733333333333333,
1253
+ "grad_norm": 1.9157836437225342,
1254
+ "learning_rate": 0.00012561832061068704,
1255
+ "loss": 1.8885,
1256
+ "mean_token_accuracy": 0.6459546625614166,
1257
+ "num_tokens": 1485215.0,
1258
+ "step": 1320
1259
+ },
1260
+ {
1261
+ "epoch": 1.1822222222222223,
1262
+ "grad_norm": 1.6572569608688354,
1263
+ "learning_rate": 0.00012500763358778627,
1264
+ "loss": 1.813,
1265
+ "mean_token_accuracy": 0.6597578257322312,
1266
+ "num_tokens": 1496371.0,
1267
+ "step": 1330
1268
+ },
1269
+ {
1270
+ "epoch": 1.1911111111111112,
1271
+ "grad_norm": 1.8602449893951416,
1272
+ "learning_rate": 0.0001243969465648855,
1273
+ "loss": 1.8179,
1274
+ "mean_token_accuracy": 0.6519266426563263,
1275
+ "num_tokens": 1508348.0,
1276
+ "step": 1340
1277
+ },
1278
+ {
1279
+ "epoch": 1.2,
1280
+ "grad_norm": 1.8736369609832764,
1281
+ "learning_rate": 0.00012378625954198472,
1282
+ "loss": 1.8029,
1283
+ "mean_token_accuracy": 0.6621162816882133,
1284
+ "num_tokens": 1519322.0,
1285
+ "step": 1350
1286
+ },
1287
+ {
1288
+ "epoch": 1.208888888888889,
1289
+ "grad_norm": 2.026744842529297,
1290
+ "learning_rate": 0.00012317557251908398,
1291
+ "loss": 1.8168,
1292
+ "mean_token_accuracy": 0.6635635286569596,
1293
+ "num_tokens": 1530183.0,
1294
+ "step": 1360
1295
+ },
1296
+ {
1297
+ "epoch": 1.2177777777777778,
1298
+ "grad_norm": 1.7360782623291016,
1299
+ "learning_rate": 0.00012256488549618322,
1300
+ "loss": 1.7521,
1301
+ "mean_token_accuracy": 0.6706348299980164,
1302
+ "num_tokens": 1540862.0,
1303
+ "step": 1370
1304
+ },
1305
+ {
1306
+ "epoch": 1.2266666666666666,
1307
+ "grad_norm": 1.9620578289031982,
1308
+ "learning_rate": 0.00012195419847328244,
1309
+ "loss": 1.8228,
1310
+ "mean_token_accuracy": 0.6569086670875549,
1311
+ "num_tokens": 1552212.0,
1312
+ "step": 1380
1313
+ },
1314
+ {
1315
+ "epoch": 1.2355555555555555,
1316
+ "grad_norm": 1.6294327974319458,
1317
+ "learning_rate": 0.00012134351145038167,
1318
+ "loss": 1.7654,
1319
+ "mean_token_accuracy": 0.6697377026081085,
1320
+ "num_tokens": 1563356.0,
1321
+ "step": 1390
1322
+ },
1323
+ {
1324
+ "epoch": 1.2444444444444445,
1325
+ "grad_norm": 1.7311524152755737,
1326
+ "learning_rate": 0.00012073282442748092,
1327
+ "loss": 1.9019,
1328
+ "mean_token_accuracy": 0.6457875579595566,
1329
+ "num_tokens": 1574569.0,
1330
+ "step": 1400
1331
+ },
1332
+ {
1333
+ "epoch": 1.2444444444444445,
1334
+ "eval_loss": 1.9411770105361938,
1335
+ "eval_mean_token_accuracy": 0.6407178282737732,
1336
+ "eval_num_tokens": 1574569.0,
1337
+ "eval_runtime": 48.3309,
1338
+ "eval_samples_per_second": 20.691,
1339
+ "eval_steps_per_second": 10.345,
1340
+ "step": 1400
1341
+ },
1342
+ {
1343
+ "epoch": 1.2533333333333334,
1344
+ "grad_norm": 1.8629728555679321,
1345
+ "learning_rate": 0.00012012213740458016,
1346
+ "loss": 1.7585,
1347
+ "mean_token_accuracy": 0.671015702188015,
1348
+ "num_tokens": 1585308.0,
1349
+ "step": 1410
1350
+ },
1351
+ {
1352
+ "epoch": 1.2622222222222224,
1353
+ "grad_norm": 1.958808183670044,
1354
+ "learning_rate": 0.0001195114503816794,
1355
+ "loss": 1.8479,
1356
+ "mean_token_accuracy": 0.6535898372530937,
1357
+ "num_tokens": 1596886.0,
1358
+ "step": 1420
1359
+ },
1360
+ {
1361
+ "epoch": 1.271111111111111,
1362
+ "grad_norm": 1.950421690940857,
1363
+ "learning_rate": 0.00011890076335877862,
1364
+ "loss": 1.8173,
1365
+ "mean_token_accuracy": 0.6655478686094284,
1366
+ "num_tokens": 1607683.0,
1367
+ "step": 1430
1368
+ },
1369
+ {
1370
+ "epoch": 1.28,
1371
+ "grad_norm": 1.8152872323989868,
1372
+ "learning_rate": 0.00011829007633587788,
1373
+ "loss": 1.8791,
1374
+ "mean_token_accuracy": 0.6531546950340271,
1375
+ "num_tokens": 1618906.0,
1376
+ "step": 1440
1377
+ },
1378
+ {
1379
+ "epoch": 1.2888888888888888,
1380
+ "grad_norm": 1.7857719659805298,
1381
+ "learning_rate": 0.0001176793893129771,
1382
+ "loss": 1.7887,
1383
+ "mean_token_accuracy": 0.6610255971550941,
1384
+ "num_tokens": 1629981.0,
1385
+ "step": 1450
1386
+ },
1387
+ {
1388
+ "epoch": 1.2977777777777777,
1389
+ "grad_norm": 1.8434971570968628,
1390
+ "learning_rate": 0.00011706870229007634,
1391
+ "loss": 1.8368,
1392
+ "mean_token_accuracy": 0.653369964659214,
1393
+ "num_tokens": 1641429.0,
1394
+ "step": 1460
1395
+ },
1396
+ {
1397
+ "epoch": 1.3066666666666666,
1398
+ "grad_norm": 1.8877320289611816,
1399
+ "learning_rate": 0.00011645801526717557,
1400
+ "loss": 1.7938,
1401
+ "mean_token_accuracy": 0.6639183640480042,
1402
+ "num_tokens": 1652601.0,
1403
+ "step": 1470
1404
+ },
1405
+ {
1406
+ "epoch": 1.3155555555555556,
1407
+ "grad_norm": 1.8121625185012817,
1408
+ "learning_rate": 0.00011584732824427482,
1409
+ "loss": 1.7862,
1410
+ "mean_token_accuracy": 0.661414910852909,
1411
+ "num_tokens": 1663837.0,
1412
+ "step": 1480
1413
+ },
1414
+ {
1415
+ "epoch": 1.3244444444444445,
1416
+ "grad_norm": 1.7919855117797852,
1417
+ "learning_rate": 0.00011523664122137406,
1418
+ "loss": 1.8148,
1419
+ "mean_token_accuracy": 0.6654411420226097,
1420
+ "num_tokens": 1675018.0,
1421
+ "step": 1490
1422
+ },
1423
+ {
1424
+ "epoch": 1.3333333333333333,
1425
+ "grad_norm": 1.828735589981079,
1426
+ "learning_rate": 0.00011462595419847328,
1427
+ "loss": 1.8456,
1428
+ "mean_token_accuracy": 0.6496043875813484,
1429
+ "num_tokens": 1686136.0,
1430
+ "step": 1500
1431
+ },
1432
+ {
1433
+ "epoch": 1.3422222222222222,
1434
+ "grad_norm": 1.9462794065475464,
1435
+ "learning_rate": 0.00011401526717557252,
1436
+ "loss": 1.8412,
1437
+ "mean_token_accuracy": 0.6603908941149712,
1438
+ "num_tokens": 1697160.0,
1439
+ "step": 1510
1440
+ },
1441
+ {
1442
+ "epoch": 1.3511111111111112,
1443
+ "grad_norm": 1.6794313192367554,
1444
+ "learning_rate": 0.00011340458015267177,
1445
+ "loss": 1.7774,
1446
+ "mean_token_accuracy": 0.6664682924747467,
1447
+ "num_tokens": 1707831.0,
1448
+ "step": 1520
1449
+ },
1450
+ {
1451
+ "epoch": 1.3599999999999999,
1452
+ "grad_norm": 1.8189337253570557,
1453
+ "learning_rate": 0.000112793893129771,
1454
+ "loss": 1.8031,
1455
+ "mean_token_accuracy": 0.6627006307244301,
1456
+ "num_tokens": 1719074.0,
1457
+ "step": 1530
1458
+ },
1459
+ {
1460
+ "epoch": 1.3688888888888888,
1461
+ "grad_norm": 2.073533296585083,
1462
+ "learning_rate": 0.00011218320610687022,
1463
+ "loss": 1.8657,
1464
+ "mean_token_accuracy": 0.6476830393075943,
1465
+ "num_tokens": 1730388.0,
1466
+ "step": 1540
1467
+ },
1468
+ {
1469
+ "epoch": 1.3777777777777778,
1470
+ "grad_norm": 2.1564207077026367,
1471
+ "learning_rate": 0.00011157251908396946,
1472
+ "loss": 1.8261,
1473
+ "mean_token_accuracy": 0.6567840203642845,
1474
+ "num_tokens": 1741806.0,
1475
+ "step": 1550
1476
+ },
1477
+ {
1478
+ "epoch": 1.3866666666666667,
1479
+ "grad_norm": 1.6113232374191284,
1480
+ "learning_rate": 0.00011096183206106871,
1481
+ "loss": 1.7753,
1482
+ "mean_token_accuracy": 0.6659888163208961,
1483
+ "num_tokens": 1753313.0,
1484
+ "step": 1560
1485
+ },
1486
+ {
1487
+ "epoch": 1.3955555555555557,
1488
+ "grad_norm": 1.8112174272537231,
1489
+ "learning_rate": 0.00011035114503816795,
1490
+ "loss": 1.8046,
1491
+ "mean_token_accuracy": 0.6593015149235726,
1492
+ "num_tokens": 1765144.0,
1493
+ "step": 1570
1494
+ },
1495
+ {
1496
+ "epoch": 1.4044444444444444,
1497
+ "grad_norm": 1.8377541303634644,
1498
+ "learning_rate": 0.00010974045801526718,
1499
+ "loss": 1.8848,
1500
+ "mean_token_accuracy": 0.6533517614006996,
1501
+ "num_tokens": 1776783.0,
1502
+ "step": 1580
1503
+ },
1504
+ {
1505
+ "epoch": 1.4133333333333333,
1506
+ "grad_norm": 1.8384325504302979,
1507
+ "learning_rate": 0.0001091297709923664,
1508
+ "loss": 1.7669,
1509
+ "mean_token_accuracy": 0.6613995045423507,
1510
+ "num_tokens": 1788274.0,
1511
+ "step": 1590
1512
+ },
1513
+ {
1514
+ "epoch": 1.4222222222222223,
1515
+ "grad_norm": 1.8124533891677856,
1516
+ "learning_rate": 0.00010851908396946567,
1517
+ "loss": 1.8164,
1518
+ "mean_token_accuracy": 0.6591159239411354,
1519
+ "num_tokens": 1799707.0,
1520
+ "step": 1600
1521
+ },
1522
+ {
1523
+ "epoch": 1.4222222222222223,
1524
+ "eval_loss": 1.9286668300628662,
1525
+ "eval_mean_token_accuracy": 0.6434953879117966,
1526
+ "eval_num_tokens": 1799707.0,
1527
+ "eval_runtime": 48.6198,
1528
+ "eval_samples_per_second": 20.568,
1529
+ "eval_steps_per_second": 10.284,
1530
+ "step": 1600
1531
+ },
1532
+ {
1533
+ "epoch": 1.431111111111111,
1534
+ "grad_norm": 1.6931661367416382,
1535
+ "learning_rate": 0.00010790839694656489,
1536
+ "loss": 1.7548,
1537
+ "mean_token_accuracy": 0.664087076485157,
1538
+ "num_tokens": 1810865.0,
1539
+ "step": 1610
1540
+ },
1541
+ {
1542
+ "epoch": 1.44,
1543
+ "grad_norm": 1.7501254081726074,
1544
+ "learning_rate": 0.00010729770992366413,
1545
+ "loss": 1.7652,
1546
+ "mean_token_accuracy": 0.6640020117163659,
1547
+ "num_tokens": 1821807.0,
1548
+ "step": 1620
1549
+ },
1550
+ {
1551
+ "epoch": 1.448888888888889,
1552
+ "grad_norm": 1.8411732912063599,
1553
+ "learning_rate": 0.00010668702290076336,
1554
+ "loss": 1.831,
1555
+ "mean_token_accuracy": 0.6564242169260979,
1556
+ "num_tokens": 1832886.0,
1557
+ "step": 1630
1558
+ },
1559
+ {
1560
+ "epoch": 1.4577777777777778,
1561
+ "grad_norm": 2.003892183303833,
1562
+ "learning_rate": 0.00010607633587786261,
1563
+ "loss": 1.7791,
1564
+ "mean_token_accuracy": 0.6632592365145683,
1565
+ "num_tokens": 1843989.0,
1566
+ "step": 1640
1567
+ },
1568
+ {
1569
+ "epoch": 1.4666666666666668,
1570
+ "grad_norm": 1.7987340688705444,
1571
+ "learning_rate": 0.00010546564885496185,
1572
+ "loss": 1.7627,
1573
+ "mean_token_accuracy": 0.6713873609900475,
1574
+ "num_tokens": 1855106.0,
1575
+ "step": 1650
1576
+ },
1577
+ {
1578
+ "epoch": 1.4755555555555555,
1579
+ "grad_norm": 1.931877851486206,
1580
+ "learning_rate": 0.00010485496183206107,
1581
+ "loss": 1.7976,
1582
+ "mean_token_accuracy": 0.6631382897496223,
1583
+ "num_tokens": 1866900.0,
1584
+ "step": 1660
1585
+ },
1586
+ {
1587
+ "epoch": 1.4844444444444445,
1588
+ "grad_norm": 1.7883687019348145,
1589
+ "learning_rate": 0.0001042442748091603,
1590
+ "loss": 1.7671,
1591
+ "mean_token_accuracy": 0.6675158813595772,
1592
+ "num_tokens": 1877911.0,
1593
+ "step": 1670
1594
+ },
1595
+ {
1596
+ "epoch": 1.4933333333333334,
1597
+ "grad_norm": 1.8195563554763794,
1598
+ "learning_rate": 0.00010363358778625955,
1599
+ "loss": 1.8346,
1600
+ "mean_token_accuracy": 0.652577318251133,
1601
+ "num_tokens": 1889580.0,
1602
+ "step": 1680
1603
+ },
1604
+ {
1605
+ "epoch": 1.5022222222222221,
1606
+ "grad_norm": 1.7439149618148804,
1607
+ "learning_rate": 0.00010302290076335879,
1608
+ "loss": 1.7476,
1609
+ "mean_token_accuracy": 0.6717594474554062,
1610
+ "num_tokens": 1901133.0,
1611
+ "step": 1690
1612
+ },
1613
+ {
1614
+ "epoch": 1.511111111111111,
1615
+ "grad_norm": 1.8155314922332764,
1616
+ "learning_rate": 0.00010241221374045801,
1617
+ "loss": 1.8044,
1618
+ "mean_token_accuracy": 0.6617274522781372,
1619
+ "num_tokens": 1911796.0,
1620
+ "step": 1700
1621
+ },
1622
+ {
1623
+ "epoch": 1.52,
1624
+ "grad_norm": 1.7685112953186035,
1625
+ "learning_rate": 0.00010180152671755725,
1626
+ "loss": 1.7727,
1627
+ "mean_token_accuracy": 0.665304908156395,
1628
+ "num_tokens": 1923217.0,
1629
+ "step": 1710
1630
+ },
1631
+ {
1632
+ "epoch": 1.528888888888889,
1633
+ "grad_norm": 1.737053632736206,
1634
+ "learning_rate": 0.0001011908396946565,
1635
+ "loss": 1.8345,
1636
+ "mean_token_accuracy": 0.6577870160341263,
1637
+ "num_tokens": 1934355.0,
1638
+ "step": 1720
1639
+ },
1640
+ {
1641
+ "epoch": 1.537777777777778,
1642
+ "grad_norm": 1.9686291217803955,
1643
+ "learning_rate": 0.00010058015267175573,
1644
+ "loss": 1.8165,
1645
+ "mean_token_accuracy": 0.6594037398695946,
1646
+ "num_tokens": 1945653.0,
1647
+ "step": 1730
1648
+ },
1649
+ {
1650
+ "epoch": 1.5466666666666666,
1651
+ "grad_norm": 1.844651699066162,
1652
+ "learning_rate": 9.996946564885497e-05,
1653
+ "loss": 1.8273,
1654
+ "mean_token_accuracy": 0.6566928923130035,
1655
+ "num_tokens": 1956891.0,
1656
+ "step": 1740
1657
+ },
1658
+ {
1659
+ "epoch": 1.5555555555555556,
1660
+ "grad_norm": 1.8607743978500366,
1661
+ "learning_rate": 9.93587786259542e-05,
1662
+ "loss": 1.785,
1663
+ "mean_token_accuracy": 0.6692357853055,
1664
+ "num_tokens": 1967789.0,
1665
+ "step": 1750
1666
+ },
1667
+ {
1668
+ "epoch": 1.5644444444444443,
1669
+ "grad_norm": 1.9204373359680176,
1670
+ "learning_rate": 9.874809160305344e-05,
1671
+ "loss": 1.8264,
1672
+ "mean_token_accuracy": 0.6549209818243981,
1673
+ "num_tokens": 1979224.0,
1674
+ "step": 1760
1675
+ },
1676
+ {
1677
+ "epoch": 1.5733333333333333,
1678
+ "grad_norm": 1.7754265069961548,
1679
+ "learning_rate": 9.813740458015268e-05,
1680
+ "loss": 1.7467,
1681
+ "mean_token_accuracy": 0.6670090600848197,
1682
+ "num_tokens": 1990255.0,
1683
+ "step": 1770
1684
+ },
1685
+ {
1686
+ "epoch": 1.5822222222222222,
1687
+ "grad_norm": 2.069091796875,
1688
+ "learning_rate": 9.752671755725191e-05,
1689
+ "loss": 1.7731,
1690
+ "mean_token_accuracy": 0.6609751120209694,
1691
+ "num_tokens": 2001606.0,
1692
+ "step": 1780
1693
+ },
1694
+ {
1695
+ "epoch": 1.5911111111111111,
1696
+ "grad_norm": 2.1375646591186523,
1697
+ "learning_rate": 9.691603053435115e-05,
1698
+ "loss": 1.8009,
1699
+ "mean_token_accuracy": 0.6624869346618653,
1700
+ "num_tokens": 2012912.0,
1701
+ "step": 1790
1702
+ },
1703
+ {
1704
+ "epoch": 1.6,
1705
+ "grad_norm": 1.5623434782028198,
1706
+ "learning_rate": 9.630534351145038e-05,
1707
+ "loss": 1.7383,
1708
+ "mean_token_accuracy": 0.6694582119584084,
1709
+ "num_tokens": 2024571.0,
1710
+ "step": 1800
1711
+ },
1712
+ {
1713
+ "epoch": 1.6,
1714
+ "eval_loss": 1.90510892868042,
1715
+ "eval_mean_token_accuracy": 0.6464553346633911,
1716
+ "eval_num_tokens": 2024571.0,
1717
+ "eval_runtime": 48.9449,
1718
+ "eval_samples_per_second": 20.431,
1719
+ "eval_steps_per_second": 10.216,
1720
+ "step": 1800
1721
+ },
1722
+ {
1723
+ "epoch": 1.608888888888889,
1724
+ "grad_norm": 1.745969295501709,
1725
+ "learning_rate": 9.569465648854963e-05,
1726
+ "loss": 1.7552,
1727
+ "mean_token_accuracy": 0.6786300778388977,
1728
+ "num_tokens": 2035783.0,
1729
+ "step": 1810
1730
+ },
1731
+ {
1732
+ "epoch": 1.6177777777777778,
1733
+ "grad_norm": 1.7463303804397583,
1734
+ "learning_rate": 9.508396946564886e-05,
1735
+ "loss": 1.7495,
1736
+ "mean_token_accuracy": 0.6666959136724472,
1737
+ "num_tokens": 2047304.0,
1738
+ "step": 1820
1739
+ },
1740
+ {
1741
+ "epoch": 1.6266666666666667,
1742
+ "grad_norm": 1.9058139324188232,
1743
+ "learning_rate": 9.44732824427481e-05,
1744
+ "loss": 1.8365,
1745
+ "mean_token_accuracy": 0.6536470741033554,
1746
+ "num_tokens": 2058792.0,
1747
+ "step": 1830
1748
+ },
1749
+ {
1750
+ "epoch": 1.6355555555555554,
1751
+ "grad_norm": 2.065488576889038,
1752
+ "learning_rate": 9.386259541984733e-05,
1753
+ "loss": 1.7939,
1754
+ "mean_token_accuracy": 0.6519258007407188,
1755
+ "num_tokens": 2070175.0,
1756
+ "step": 1840
1757
+ },
1758
+ {
1759
+ "epoch": 1.6444444444444444,
1760
+ "grad_norm": 1.778023600578308,
1761
+ "learning_rate": 9.325190839694658e-05,
1762
+ "loss": 1.8155,
1763
+ "mean_token_accuracy": 0.655296416580677,
1764
+ "num_tokens": 2081343.0,
1765
+ "step": 1850
1766
+ },
1767
+ {
1768
+ "epoch": 1.6533333333333333,
1769
+ "grad_norm": 1.7437517642974854,
1770
+ "learning_rate": 9.26412213740458e-05,
1771
+ "loss": 1.7996,
1772
+ "mean_token_accuracy": 0.6618543311953544,
1773
+ "num_tokens": 2093074.0,
1774
+ "step": 1860
1775
+ },
1776
+ {
1777
+ "epoch": 1.6622222222222223,
1778
+ "grad_norm": 1.7666471004486084,
1779
+ "learning_rate": 9.203053435114505e-05,
1780
+ "loss": 1.7658,
1781
+ "mean_token_accuracy": 0.6631957843899727,
1782
+ "num_tokens": 2104640.0,
1783
+ "step": 1870
1784
+ },
1785
+ {
1786
+ "epoch": 1.6711111111111112,
1787
+ "grad_norm": 1.912842869758606,
1788
+ "learning_rate": 9.141984732824428e-05,
1789
+ "loss": 1.7996,
1790
+ "mean_token_accuracy": 0.6606781020760536,
1791
+ "num_tokens": 2115628.0,
1792
+ "step": 1880
1793
+ },
1794
+ {
1795
+ "epoch": 1.6800000000000002,
1796
+ "grad_norm": 1.7230331897735596,
1797
+ "learning_rate": 9.080916030534351e-05,
1798
+ "loss": 1.8042,
1799
+ "mean_token_accuracy": 0.6600380197167397,
1800
+ "num_tokens": 2126505.0,
1801
+ "step": 1890
1802
+ },
1803
+ {
1804
+ "epoch": 1.6888888888888889,
1805
+ "grad_norm": 1.7043401002883911,
1806
+ "learning_rate": 9.019847328244276e-05,
1807
+ "loss": 1.7993,
1808
+ "mean_token_accuracy": 0.6613149493932724,
1809
+ "num_tokens": 2138364.0,
1810
+ "step": 1900
1811
+ },
1812
+ {
1813
+ "epoch": 1.6977777777777778,
1814
+ "grad_norm": 1.9145572185516357,
1815
+ "learning_rate": 8.958778625954198e-05,
1816
+ "loss": 1.8046,
1817
+ "mean_token_accuracy": 0.662477345764637,
1818
+ "num_tokens": 2149425.0,
1819
+ "step": 1910
1820
+ },
1821
+ {
1822
+ "epoch": 1.7066666666666666,
1823
+ "grad_norm": 1.7448140382766724,
1824
+ "learning_rate": 8.897709923664123e-05,
1825
+ "loss": 1.8004,
1826
+ "mean_token_accuracy": 0.6539181426167489,
1827
+ "num_tokens": 2160843.0,
1828
+ "step": 1920
1829
+ },
1830
+ {
1831
+ "epoch": 1.7155555555555555,
1832
+ "grad_norm": 1.8304840326309204,
1833
+ "learning_rate": 8.836641221374045e-05,
1834
+ "loss": 1.8404,
1835
+ "mean_token_accuracy": 0.6593489304184914,
1836
+ "num_tokens": 2172044.0,
1837
+ "step": 1930
1838
+ },
1839
+ {
1840
+ "epoch": 1.7244444444444444,
1841
+ "grad_norm": 1.802331566810608,
1842
+ "learning_rate": 8.77557251908397e-05,
1843
+ "loss": 1.7995,
1844
+ "mean_token_accuracy": 0.6634193584322929,
1845
+ "num_tokens": 2182916.0,
1846
+ "step": 1940
1847
+ },
1848
+ {
1849
+ "epoch": 1.7333333333333334,
1850
+ "grad_norm": 1.9834682941436768,
1851
+ "learning_rate": 8.714503816793894e-05,
1852
+ "loss": 1.7525,
1853
+ "mean_token_accuracy": 0.6685526207089424,
1854
+ "num_tokens": 2194913.0,
1855
+ "step": 1950
1856
+ },
1857
+ {
1858
+ "epoch": 1.7422222222222223,
1859
+ "grad_norm": 1.8077235221862793,
1860
+ "learning_rate": 8.653435114503817e-05,
1861
+ "loss": 1.7612,
1862
+ "mean_token_accuracy": 0.6704939991235733,
1863
+ "num_tokens": 2205721.0,
1864
+ "step": 1960
1865
+ },
1866
+ {
1867
+ "epoch": 1.751111111111111,
1868
+ "grad_norm": 1.957993745803833,
1869
+ "learning_rate": 8.592366412213741e-05,
1870
+ "loss": 1.8059,
1871
+ "mean_token_accuracy": 0.6547697961330414,
1872
+ "num_tokens": 2217489.0,
1873
+ "step": 1970
1874
+ },
1875
+ {
1876
+ "epoch": 1.76,
1877
+ "grad_norm": 1.7215981483459473,
1878
+ "learning_rate": 8.531297709923664e-05,
1879
+ "loss": 1.7913,
1880
+ "mean_token_accuracy": 0.657075221836567,
1881
+ "num_tokens": 2228972.0,
1882
+ "step": 1980
1883
+ },
1884
+ {
1885
+ "epoch": 1.7688888888888887,
1886
+ "grad_norm": 1.8760231733322144,
1887
+ "learning_rate": 8.470229007633588e-05,
1888
+ "loss": 1.7923,
1889
+ "mean_token_accuracy": 0.6629065066576004,
1890
+ "num_tokens": 2240239.0,
1891
+ "step": 1990
1892
+ },
1893
+ {
1894
+ "epoch": 1.7777777777777777,
1895
+ "grad_norm": 2.092407703399658,
1896
+ "learning_rate": 8.409160305343512e-05,
1897
+ "loss": 1.7593,
1898
+ "mean_token_accuracy": 0.6686230883002281,
1899
+ "num_tokens": 2251436.0,
1900
+ "step": 2000
1901
+ },
1902
+ {
1903
+ "epoch": 1.7777777777777777,
1904
+ "eval_loss": 1.893255591392517,
1905
+ "eval_mean_token_accuracy": 0.6482590944766998,
1906
+ "eval_num_tokens": 2251436.0,
1907
+ "eval_runtime": 49.0676,
1908
+ "eval_samples_per_second": 20.38,
1909
+ "eval_steps_per_second": 10.19,
1910
+ "step": 2000
1911
+ },
1912
+ {
1913
+ "epoch": 1.7866666666666666,
1914
+ "grad_norm": 1.7836107015609741,
1915
+ "learning_rate": 8.348091603053435e-05,
1916
+ "loss": 1.8033,
1917
+ "mean_token_accuracy": 0.6598399996757507,
1918
+ "num_tokens": 2263069.0,
1919
+ "step": 2010
1920
+ },
1921
+ {
1922
+ "epoch": 1.7955555555555556,
1923
+ "grad_norm": 1.7955141067504883,
1924
+ "learning_rate": 8.287022900763359e-05,
1925
+ "loss": 1.7922,
1926
+ "mean_token_accuracy": 0.6619856491684913,
1927
+ "num_tokens": 2274050.0,
1928
+ "step": 2020
1929
+ },
1930
+ {
1931
+ "epoch": 1.8044444444444445,
1932
+ "grad_norm": 1.7887564897537231,
1933
+ "learning_rate": 8.225954198473282e-05,
1934
+ "loss": 1.8353,
1935
+ "mean_token_accuracy": 0.658150726556778,
1936
+ "num_tokens": 2285060.0,
1937
+ "step": 2030
1938
+ },
1939
+ {
1940
+ "epoch": 1.8133333333333335,
1941
+ "grad_norm": 1.8892567157745361,
1942
+ "learning_rate": 8.164885496183207e-05,
1943
+ "loss": 1.7266,
1944
+ "mean_token_accuracy": 0.6728688895702362,
1945
+ "num_tokens": 2296211.0,
1946
+ "step": 2040
1947
+ },
1948
+ {
1949
+ "epoch": 1.8222222222222222,
1950
+ "grad_norm": 1.9226106405258179,
1951
+ "learning_rate": 8.10381679389313e-05,
1952
+ "loss": 1.7243,
1953
+ "mean_token_accuracy": 0.6712497785687447,
1954
+ "num_tokens": 2307184.0,
1955
+ "step": 2050
1956
+ },
1957
+ {
1958
+ "epoch": 1.8311111111111111,
1959
+ "grad_norm": 1.735863208770752,
1960
+ "learning_rate": 8.042748091603054e-05,
1961
+ "loss": 1.7739,
1962
+ "mean_token_accuracy": 0.6621047109365463,
1963
+ "num_tokens": 2318602.0,
1964
+ "step": 2060
1965
+ },
1966
+ {
1967
+ "epoch": 1.8399999999999999,
1968
+ "grad_norm": 1.8361355066299438,
1969
+ "learning_rate": 7.981679389312977e-05,
1970
+ "loss": 1.8223,
1971
+ "mean_token_accuracy": 0.6560095950961113,
1972
+ "num_tokens": 2330193.0,
1973
+ "step": 2070
1974
+ },
1975
+ {
1976
+ "epoch": 1.8488888888888888,
1977
+ "grad_norm": 1.8159486055374146,
1978
+ "learning_rate": 7.920610687022902e-05,
1979
+ "loss": 1.7695,
1980
+ "mean_token_accuracy": 0.6657541528344154,
1981
+ "num_tokens": 2341442.0,
1982
+ "step": 2080
1983
+ },
1984
+ {
1985
+ "epoch": 1.8577777777777778,
1986
+ "grad_norm": 1.9189419746398926,
1987
+ "learning_rate": 7.859541984732824e-05,
1988
+ "loss": 1.8333,
1989
+ "mean_token_accuracy": 0.6628425523638726,
1990
+ "num_tokens": 2352479.0,
1991
+ "step": 2090
1992
+ },
1993
+ {
1994
+ "epoch": 1.8666666666666667,
1995
+ "grad_norm": 1.8809512853622437,
1996
+ "learning_rate": 7.798473282442749e-05,
1997
+ "loss": 1.7371,
1998
+ "mean_token_accuracy": 0.6683435723185539,
1999
+ "num_tokens": 2363642.0,
2000
+ "step": 2100
2001
+ },
2002
+ {
2003
+ "epoch": 1.8755555555555556,
2004
+ "grad_norm": 1.845886468887329,
2005
+ "learning_rate": 7.737404580152672e-05,
2006
+ "loss": 1.7774,
2007
+ "mean_token_accuracy": 0.6559944331645966,
2008
+ "num_tokens": 2375376.0,
2009
+ "step": 2110
2010
+ },
2011
+ {
2012
+ "epoch": 1.8844444444444446,
2013
+ "grad_norm": 1.7780894041061401,
2014
+ "learning_rate": 7.676335877862596e-05,
2015
+ "loss": 1.7823,
2016
+ "mean_token_accuracy": 0.6601730152964592,
2017
+ "num_tokens": 2386944.0,
2018
+ "step": 2120
2019
+ },
2020
+ {
2021
+ "epoch": 1.8933333333333333,
2022
+ "grad_norm": 1.9167022705078125,
2023
+ "learning_rate": 7.61526717557252e-05,
2024
+ "loss": 1.7869,
2025
+ "mean_token_accuracy": 0.6573449537158013,
2026
+ "num_tokens": 2398391.0,
2027
+ "step": 2130
2028
+ },
2029
+ {
2030
+ "epoch": 1.9022222222222223,
2031
+ "grad_norm": 2.037911891937256,
2032
+ "learning_rate": 7.554198473282443e-05,
2033
+ "loss": 1.7858,
2034
+ "mean_token_accuracy": 0.6593190267682075,
2035
+ "num_tokens": 2409837.0,
2036
+ "step": 2140
2037
+ },
2038
+ {
2039
+ "epoch": 1.911111111111111,
2040
+ "grad_norm": 1.7496647834777832,
2041
+ "learning_rate": 7.493129770992367e-05,
2042
+ "loss": 1.7241,
2043
+ "mean_token_accuracy": 0.6702290028333664,
2044
+ "num_tokens": 2421607.0,
2045
+ "step": 2150
2046
+ },
2047
+ {
2048
+ "epoch": 1.92,
2049
+ "grad_norm": 2.0227596759796143,
2050
+ "learning_rate": 7.43206106870229e-05,
2051
+ "loss": 1.7731,
2052
+ "mean_token_accuracy": 0.6679618924856185,
2053
+ "num_tokens": 2432376.0,
2054
+ "step": 2160
2055
+ },
2056
+ {
2057
+ "epoch": 1.9288888888888889,
2058
+ "grad_norm": 1.7401562929153442,
2059
+ "learning_rate": 7.370992366412214e-05,
2060
+ "loss": 1.7684,
2061
+ "mean_token_accuracy": 0.6676609605550766,
2062
+ "num_tokens": 2443683.0,
2063
+ "step": 2170
2064
+ },
2065
+ {
2066
+ "epoch": 1.9377777777777778,
2067
+ "grad_norm": 2.709106922149658,
2068
+ "learning_rate": 7.309923664122137e-05,
2069
+ "loss": 1.709,
2070
+ "mean_token_accuracy": 0.6738818466663361,
2071
+ "num_tokens": 2454757.0,
2072
+ "step": 2180
2073
+ },
2074
+ {
2075
+ "epoch": 1.9466666666666668,
2076
+ "grad_norm": 1.8504191637039185,
2077
+ "learning_rate": 7.248854961832061e-05,
2078
+ "loss": 1.7411,
2079
+ "mean_token_accuracy": 0.6681609645485878,
2080
+ "num_tokens": 2465562.0,
2081
+ "step": 2190
2082
+ },
2083
+ {
2084
+ "epoch": 1.9555555555555557,
2085
+ "grad_norm": 1.9488162994384766,
2086
+ "learning_rate": 7.187786259541986e-05,
2087
+ "loss": 1.7927,
2088
+ "mean_token_accuracy": 0.6587553441524505,
2089
+ "num_tokens": 2476869.0,
2090
+ "step": 2200
2091
+ },
2092
+ {
2093
+ "epoch": 1.9555555555555557,
2094
+ "eval_loss": 1.8803235292434692,
2095
+ "eval_mean_token_accuracy": 0.6499251070022583,
2096
+ "eval_num_tokens": 2476869.0,
2097
+ "eval_runtime": 47.7648,
2098
+ "eval_samples_per_second": 20.936,
2099
+ "eval_steps_per_second": 10.468,
2100
+ "step": 2200
2101
+ },
2102
+ {
2103
+ "epoch": 1.9644444444444444,
2104
+ "grad_norm": 1.9747337102890015,
2105
+ "learning_rate": 7.132824427480917e-05,
2106
+ "loss": 1.7689,
2107
+ "mean_token_accuracy": 0.666295376420021,
2108
+ "num_tokens": 2487704.0,
2109
+ "step": 2210
2110
+ },
2111
+ {
2112
+ "epoch": 1.9733333333333334,
2113
+ "grad_norm": 1.8904316425323486,
2114
+ "learning_rate": 7.071755725190839e-05,
2115
+ "loss": 1.7538,
2116
+ "mean_token_accuracy": 0.6645636394619941,
2117
+ "num_tokens": 2498918.0,
2118
+ "step": 2220
2119
+ },
2120
+ {
2121
+ "epoch": 1.982222222222222,
2122
+ "grad_norm": 1.8791844844818115,
2123
+ "learning_rate": 7.010687022900764e-05,
2124
+ "loss": 1.7926,
2125
+ "mean_token_accuracy": 0.6631673067808151,
2126
+ "num_tokens": 2509728.0,
2127
+ "step": 2230
2128
+ },
2129
+ {
2130
+ "epoch": 1.991111111111111,
2131
+ "grad_norm": 1.9756606817245483,
2132
+ "learning_rate": 6.949618320610687e-05,
2133
+ "loss": 1.7863,
2134
+ "mean_token_accuracy": 0.6628521859645844,
2135
+ "num_tokens": 2521073.0,
2136
+ "step": 2240
2137
+ },
2138
+ {
2139
+ "epoch": 2.0,
2140
+ "grad_norm": 1.7894699573516846,
2141
+ "learning_rate": 6.888549618320611e-05,
2142
+ "loss": 1.7539,
2143
+ "mean_token_accuracy": 0.6728802308440208,
2144
+ "num_tokens": 2531820.0,
2145
+ "step": 2250
2146
+ },
2147
+ {
2148
+ "epoch": 2.008888888888889,
2149
+ "grad_norm": 1.702850341796875,
2150
+ "learning_rate": 6.827480916030535e-05,
2151
+ "loss": 1.4903,
2152
+ "mean_token_accuracy": 0.7138098135590554,
2153
+ "num_tokens": 2542512.0,
2154
+ "step": 2260
2155
+ },
2156
+ {
2157
+ "epoch": 2.017777777777778,
2158
+ "grad_norm": 1.7931528091430664,
2159
+ "learning_rate": 6.766412213740458e-05,
2160
+ "loss": 1.601,
2161
+ "mean_token_accuracy": 0.6894692406058311,
2162
+ "num_tokens": 2553338.0,
2163
+ "step": 2270
2164
+ },
2165
+ {
2166
+ "epoch": 2.026666666666667,
2167
+ "grad_norm": 2.228480339050293,
2168
+ "learning_rate": 6.705343511450382e-05,
2169
+ "loss": 1.609,
2170
+ "mean_token_accuracy": 0.6943154886364937,
2171
+ "num_tokens": 2564182.0,
2172
+ "step": 2280
2173
+ },
2174
+ {
2175
+ "epoch": 2.0355555555555553,
2176
+ "grad_norm": 1.9658042192459106,
2177
+ "learning_rate": 6.644274809160305e-05,
2178
+ "loss": 1.6545,
2179
+ "mean_token_accuracy": 0.6824306204915047,
2180
+ "num_tokens": 2575789.0,
2181
+ "step": 2290
2182
+ },
2183
+ {
2184
+ "epoch": 2.0444444444444443,
2185
+ "grad_norm": 1.7540594339370728,
2186
+ "learning_rate": 6.583206106870229e-05,
2187
+ "loss": 1.6229,
2188
+ "mean_token_accuracy": 0.6881745710968972,
2189
+ "num_tokens": 2587147.0,
2190
+ "step": 2300
2191
+ },
2192
+ {
2193
+ "epoch": 2.0533333333333332,
2194
+ "grad_norm": 1.799501895904541,
2195
+ "learning_rate": 6.522137404580153e-05,
2196
+ "loss": 1.6119,
2197
+ "mean_token_accuracy": 0.6896049126982688,
2198
+ "num_tokens": 2598282.0,
2199
+ "step": 2310
2200
+ },
2201
+ {
2202
+ "epoch": 2.062222222222222,
2203
+ "grad_norm": 1.7720867395401,
2204
+ "learning_rate": 6.461068702290076e-05,
2205
+ "loss": 1.5519,
2206
+ "mean_token_accuracy": 0.7038252353668213,
2207
+ "num_tokens": 2609125.0,
2208
+ "step": 2320
2209
+ },
2210
+ {
2211
+ "epoch": 2.071111111111111,
2212
+ "grad_norm": 1.994992971420288,
2213
+ "learning_rate": 6.400000000000001e-05,
2214
+ "loss": 1.5872,
2215
+ "mean_token_accuracy": 0.690100908279419,
2216
+ "num_tokens": 2620411.0,
2217
+ "step": 2330
2218
+ },
2219
+ {
2220
+ "epoch": 2.08,
2221
+ "grad_norm": 1.9283640384674072,
2222
+ "learning_rate": 6.338931297709923e-05,
2223
+ "loss": 1.5867,
2224
+ "mean_token_accuracy": 0.6923216238617897,
2225
+ "num_tokens": 2631795.0,
2226
+ "step": 2340
2227
+ },
2228
+ {
2229
+ "epoch": 2.088888888888889,
2230
+ "grad_norm": 1.9957973957061768,
2231
+ "learning_rate": 6.277862595419848e-05,
2232
+ "loss": 1.5996,
2233
+ "mean_token_accuracy": 0.6924369186162949,
2234
+ "num_tokens": 2643179.0,
2235
+ "step": 2350
2236
+ },
2237
+ {
2238
+ "epoch": 2.097777777777778,
2239
+ "grad_norm": 2.0207560062408447,
2240
+ "learning_rate": 6.21679389312977e-05,
2241
+ "loss": 1.515,
2242
+ "mean_token_accuracy": 0.7066755428910255,
2243
+ "num_tokens": 2654206.0,
2244
+ "step": 2360
2245
+ },
2246
+ {
2247
+ "epoch": 2.1066666666666665,
2248
+ "grad_norm": 1.8871878385543823,
2249
+ "learning_rate": 6.155725190839695e-05,
2250
+ "loss": 1.6139,
2251
+ "mean_token_accuracy": 0.687422800064087,
2252
+ "num_tokens": 2665582.0,
2253
+ "step": 2370
2254
+ },
2255
+ {
2256
+ "epoch": 2.1155555555555554,
2257
+ "grad_norm": 1.717610478401184,
2258
+ "learning_rate": 6.094656488549618e-05,
2259
+ "loss": 1.6388,
2260
+ "mean_token_accuracy": 0.6870575189590454,
2261
+ "num_tokens": 2677533.0,
2262
+ "step": 2380
2263
+ },
2264
+ {
2265
+ "epoch": 2.1244444444444444,
2266
+ "grad_norm": 1.8574187755584717,
2267
+ "learning_rate": 6.0335877862595426e-05,
2268
+ "loss": 1.557,
2269
+ "mean_token_accuracy": 0.6999430671334267,
2270
+ "num_tokens": 2688755.0,
2271
+ "step": 2390
2272
+ },
2273
+ {
2274
+ "epoch": 2.1333333333333333,
2275
+ "grad_norm": 1.9739580154418945,
2276
+ "learning_rate": 5.9725190839694655e-05,
2277
+ "loss": 1.6553,
2278
+ "mean_token_accuracy": 0.6819543272256852,
2279
+ "num_tokens": 2700558.0,
2280
+ "step": 2400
2281
+ },
2282
+ {
2283
+ "epoch": 2.1333333333333333,
2284
+ "eval_loss": 1.8970768451690674,
2285
+ "eval_mean_token_accuracy": 0.6490416256189346,
2286
+ "eval_num_tokens": 2700558.0,
2287
+ "eval_runtime": 47.6704,
2288
+ "eval_samples_per_second": 20.977,
2289
+ "eval_steps_per_second": 10.489,
2290
+ "step": 2400
2291
+ },
2292
+ {
2293
+ "epoch": 2.1422222222222222,
2294
+ "grad_norm": 1.893918514251709,
2295
+ "learning_rate": 5.91145038167939e-05,
2296
+ "loss": 1.5459,
2297
+ "mean_token_accuracy": 0.6963777393102646,
2298
+ "num_tokens": 2711713.0,
2299
+ "step": 2410
2300
+ },
2301
+ {
2302
+ "epoch": 2.151111111111111,
2303
+ "grad_norm": 1.9607445001602173,
2304
+ "learning_rate": 5.850381679389313e-05,
2305
+ "loss": 1.6373,
2306
+ "mean_token_accuracy": 0.6815788432955742,
2307
+ "num_tokens": 2723686.0,
2308
+ "step": 2420
2309
+ },
2310
+ {
2311
+ "epoch": 2.16,
2312
+ "grad_norm": 2.091732978820801,
2313
+ "learning_rate": 5.789312977099237e-05,
2314
+ "loss": 1.6422,
2315
+ "mean_token_accuracy": 0.6811213716864586,
2316
+ "num_tokens": 2735300.0,
2317
+ "step": 2430
2318
+ },
2319
+ {
2320
+ "epoch": 2.168888888888889,
2321
+ "grad_norm": 2.1138076782226562,
2322
+ "learning_rate": 5.7282442748091605e-05,
2323
+ "loss": 1.5848,
2324
+ "mean_token_accuracy": 0.6962573245167732,
2325
+ "num_tokens": 2746248.0,
2326
+ "step": 2440
2327
+ },
2328
+ {
2329
+ "epoch": 2.1777777777777776,
2330
+ "grad_norm": 2.1495392322540283,
2331
+ "learning_rate": 5.667175572519085e-05,
2332
+ "loss": 1.576,
2333
+ "mean_token_accuracy": 0.6990228727459907,
2334
+ "num_tokens": 2757259.0,
2335
+ "step": 2450
2336
+ },
2337
+ {
2338
+ "epoch": 2.1866666666666665,
2339
+ "grad_norm": 2.1444251537323,
2340
+ "learning_rate": 5.606106870229008e-05,
2341
+ "loss": 1.5979,
2342
+ "mean_token_accuracy": 0.6916472837328911,
2343
+ "num_tokens": 2768228.0,
2344
+ "step": 2460
2345
+ },
2346
+ {
2347
+ "epoch": 2.1955555555555555,
2348
+ "grad_norm": 1.945489525794983,
2349
+ "learning_rate": 5.545038167938932e-05,
2350
+ "loss": 1.5663,
2351
+ "mean_token_accuracy": 0.7005513325333595,
2352
+ "num_tokens": 2779254.0,
2353
+ "step": 2470
2354
+ },
2355
+ {
2356
+ "epoch": 2.2044444444444444,
2357
+ "grad_norm": 1.8256646394729614,
2358
+ "learning_rate": 5.483969465648855e-05,
2359
+ "loss": 1.5751,
2360
+ "mean_token_accuracy": 0.6961624413728714,
2361
+ "num_tokens": 2790326.0,
2362
+ "step": 2480
2363
+ },
2364
+ {
2365
+ "epoch": 2.2133333333333334,
2366
+ "grad_norm": 1.9541441202163696,
2367
+ "learning_rate": 5.422900763358779e-05,
2368
+ "loss": 1.6268,
2369
+ "mean_token_accuracy": 0.6893054991960526,
2370
+ "num_tokens": 2801625.0,
2371
+ "step": 2490
2372
+ },
2373
+ {
2374
+ "epoch": 2.2222222222222223,
2375
+ "grad_norm": 2.0127615928649902,
2376
+ "learning_rate": 5.361832061068702e-05,
2377
+ "loss": 1.6096,
2378
+ "mean_token_accuracy": 0.6923437744379044,
2379
+ "num_tokens": 2813010.0,
2380
+ "step": 2500
2381
+ },
2382
+ {
2383
+ "epoch": 2.2311111111111113,
2384
+ "grad_norm": 2.0325839519500732,
2385
+ "learning_rate": 5.300763358778626e-05,
2386
+ "loss": 1.5963,
2387
+ "mean_token_accuracy": 0.6913090571761131,
2388
+ "num_tokens": 2824021.0,
2389
+ "step": 2510
2390
+ },
2391
+ {
2392
+ "epoch": 2.24,
2393
+ "grad_norm": 2.1595821380615234,
2394
+ "learning_rate": 5.23969465648855e-05,
2395
+ "loss": 1.5617,
2396
+ "mean_token_accuracy": 0.7037980020046234,
2397
+ "num_tokens": 2835232.0,
2398
+ "step": 2520
2399
+ },
2400
+ {
2401
+ "epoch": 2.2488888888888887,
2402
+ "grad_norm": 2.11661958694458,
2403
+ "learning_rate": 5.178625954198474e-05,
2404
+ "loss": 1.6213,
2405
+ "mean_token_accuracy": 0.6836483731865883,
2406
+ "num_tokens": 2846524.0,
2407
+ "step": 2530
2408
+ },
2409
+ {
2410
+ "epoch": 2.2577777777777777,
2411
+ "grad_norm": 1.88747239112854,
2412
+ "learning_rate": 5.117557251908397e-05,
2413
+ "loss": 1.6408,
2414
+ "mean_token_accuracy": 0.6860729962587356,
2415
+ "num_tokens": 2857788.0,
2416
+ "step": 2540
2417
+ },
2418
+ {
2419
+ "epoch": 2.2666666666666666,
2420
+ "grad_norm": 1.9622093439102173,
2421
+ "learning_rate": 5.056488549618321e-05,
2422
+ "loss": 1.5519,
2423
+ "mean_token_accuracy": 0.7002682030200958,
2424
+ "num_tokens": 2868618.0,
2425
+ "step": 2550
2426
+ },
2427
+ {
2428
+ "epoch": 2.2755555555555556,
2429
+ "grad_norm": 1.9343371391296387,
2430
+ "learning_rate": 4.995419847328244e-05,
2431
+ "loss": 1.5795,
2432
+ "mean_token_accuracy": 0.6934511423110962,
2433
+ "num_tokens": 2879999.0,
2434
+ "step": 2560
2435
+ },
2436
+ {
2437
+ "epoch": 2.2844444444444445,
2438
+ "grad_norm": 1.9991627931594849,
2439
+ "learning_rate": 4.934351145038168e-05,
2440
+ "loss": 1.6183,
2441
+ "mean_token_accuracy": 0.6901679039001465,
2442
+ "num_tokens": 2891053.0,
2443
+ "step": 2570
2444
+ },
2445
+ {
2446
+ "epoch": 2.2933333333333334,
2447
+ "grad_norm": 1.9480003118515015,
2448
+ "learning_rate": 4.8732824427480914e-05,
2449
+ "loss": 1.5826,
2450
+ "mean_token_accuracy": 0.7007558569312096,
2451
+ "num_tokens": 2901905.0,
2452
+ "step": 2580
2453
+ },
2454
+ {
2455
+ "epoch": 2.3022222222222224,
2456
+ "grad_norm": 2.021207332611084,
2457
+ "learning_rate": 4.812213740458015e-05,
2458
+ "loss": 1.6348,
2459
+ "mean_token_accuracy": 0.6848765298724174,
2460
+ "num_tokens": 2913571.0,
2461
+ "step": 2590
2462
+ },
2463
+ {
2464
+ "epoch": 2.311111111111111,
2465
+ "grad_norm": 1.8385164737701416,
2466
+ "learning_rate": 4.751145038167939e-05,
2467
+ "loss": 1.5763,
2468
+ "mean_token_accuracy": 0.6912240386009216,
2469
+ "num_tokens": 2925533.0,
2470
+ "step": 2600
2471
+ },
2472
+ {
2473
+ "epoch": 2.311111111111111,
2474
+ "eval_loss": 1.8940143585205078,
2475
+ "eval_mean_token_accuracy": 0.6499911918640137,
2476
+ "eval_num_tokens": 2925533.0,
2477
+ "eval_runtime": 47.456,
2478
+ "eval_samples_per_second": 21.072,
2479
+ "eval_steps_per_second": 10.536,
2480
+ "step": 2600
2481
+ },
2482
+ {
2483
+ "epoch": 2.32,
2484
+ "grad_norm": 1.9455375671386719,
2485
+ "learning_rate": 4.690076335877863e-05,
2486
+ "loss": 1.598,
2487
+ "mean_token_accuracy": 0.6915700435638428,
2488
+ "num_tokens": 2936620.0,
2489
+ "step": 2610
2490
+ },
2491
+ {
2492
+ "epoch": 2.328888888888889,
2493
+ "grad_norm": 1.863487720489502,
2494
+ "learning_rate": 4.6290076335877864e-05,
2495
+ "loss": 1.5512,
2496
+ "mean_token_accuracy": 0.7025073647499085,
2497
+ "num_tokens": 2947753.0,
2498
+ "step": 2620
2499
+ },
2500
+ {
2501
+ "epoch": 2.3377777777777777,
2502
+ "grad_norm": 1.9756685495376587,
2503
+ "learning_rate": 4.56793893129771e-05,
2504
+ "loss": 1.5973,
2505
+ "mean_token_accuracy": 0.6870647758245468,
2506
+ "num_tokens": 2959635.0,
2507
+ "step": 2630
2508
+ },
2509
+ {
2510
+ "epoch": 2.3466666666666667,
2511
+ "grad_norm": 2.190765142440796,
2512
+ "learning_rate": 4.5068702290076336e-05,
2513
+ "loss": 1.5948,
2514
+ "mean_token_accuracy": 0.6888303905725479,
2515
+ "num_tokens": 2971675.0,
2516
+ "step": 2640
2517
+ },
2518
+ {
2519
+ "epoch": 2.3555555555555556,
2520
+ "grad_norm": 1.827318787574768,
2521
+ "learning_rate": 4.445801526717557e-05,
2522
+ "loss": 1.5682,
2523
+ "mean_token_accuracy": 0.6952902913093567,
2524
+ "num_tokens": 2982744.0,
2525
+ "step": 2650
2526
+ },
2527
+ {
2528
+ "epoch": 2.3644444444444446,
2529
+ "grad_norm": 2.11799693107605,
2530
+ "learning_rate": 4.384732824427481e-05,
2531
+ "loss": 1.6221,
2532
+ "mean_token_accuracy": 0.6794109031558037,
2533
+ "num_tokens": 2994347.0,
2534
+ "step": 2660
2535
+ },
2536
+ {
2537
+ "epoch": 2.3733333333333335,
2538
+ "grad_norm": 2.1472220420837402,
2539
+ "learning_rate": 4.3236641221374044e-05,
2540
+ "loss": 1.6353,
2541
+ "mean_token_accuracy": 0.6876759916543961,
2542
+ "num_tokens": 3005174.0,
2543
+ "step": 2670
2544
+ },
2545
+ {
2546
+ "epoch": 2.3822222222222225,
2547
+ "grad_norm": 1.9971054792404175,
2548
+ "learning_rate": 4.2625954198473286e-05,
2549
+ "loss": 1.5372,
2550
+ "mean_token_accuracy": 0.7059834420680999,
2551
+ "num_tokens": 3016492.0,
2552
+ "step": 2680
2553
+ },
2554
+ {
2555
+ "epoch": 2.391111111111111,
2556
+ "grad_norm": 2.067861318588257,
2557
+ "learning_rate": 4.201526717557252e-05,
2558
+ "loss": 1.572,
2559
+ "mean_token_accuracy": 0.6911077201366425,
2560
+ "num_tokens": 3027826.0,
2561
+ "step": 2690
2562
+ },
2563
+ {
2564
+ "epoch": 2.4,
2565
+ "grad_norm": 2.0372536182403564,
2566
+ "learning_rate": 4.140458015267176e-05,
2567
+ "loss": 1.5615,
2568
+ "mean_token_accuracy": 0.6972797185182571,
2569
+ "num_tokens": 3038770.0,
2570
+ "step": 2700
2571
+ },
2572
+ {
2573
+ "epoch": 2.408888888888889,
2574
+ "grad_norm": 2.15972638130188,
2575
+ "learning_rate": 4.0793893129770994e-05,
2576
+ "loss": 1.5806,
2577
+ "mean_token_accuracy": 0.6947444006800652,
2578
+ "num_tokens": 3050159.0,
2579
+ "step": 2710
2580
+ },
2581
+ {
2582
+ "epoch": 2.417777777777778,
2583
+ "grad_norm": 2.059760808944702,
2584
+ "learning_rate": 4.018320610687023e-05,
2585
+ "loss": 1.6167,
2586
+ "mean_token_accuracy": 0.6882677704095841,
2587
+ "num_tokens": 3061009.0,
2588
+ "step": 2720
2589
+ },
2590
+ {
2591
+ "epoch": 2.4266666666666667,
2592
+ "grad_norm": 1.9914629459381104,
2593
+ "learning_rate": 3.9572519083969466e-05,
2594
+ "loss": 1.5508,
2595
+ "mean_token_accuracy": 0.6985371947288513,
2596
+ "num_tokens": 3072232.0,
2597
+ "step": 2730
2598
+ },
2599
+ {
2600
+ "epoch": 2.4355555555555557,
2601
+ "grad_norm": 2.0151119232177734,
2602
+ "learning_rate": 3.89618320610687e-05,
2603
+ "loss": 1.663,
2604
+ "mean_token_accuracy": 0.6849021047353745,
2605
+ "num_tokens": 3083939.0,
2606
+ "step": 2740
2607
+ },
2608
+ {
2609
+ "epoch": 2.4444444444444446,
2610
+ "grad_norm": 2.02457332611084,
2611
+ "learning_rate": 3.835114503816794e-05,
2612
+ "loss": 1.6043,
2613
+ "mean_token_accuracy": 0.6891427770256996,
2614
+ "num_tokens": 3095354.0,
2615
+ "step": 2750
2616
+ },
2617
+ {
2618
+ "epoch": 2.453333333333333,
2619
+ "grad_norm": 1.930341362953186,
2620
+ "learning_rate": 3.774045801526718e-05,
2621
+ "loss": 1.5648,
2622
+ "mean_token_accuracy": 0.6962095096707344,
2623
+ "num_tokens": 3106679.0,
2624
+ "step": 2760
2625
+ },
2626
+ {
2627
+ "epoch": 2.462222222222222,
2628
+ "grad_norm": 2.1718850135803223,
2629
+ "learning_rate": 3.7129770992366416e-05,
2630
+ "loss": 1.5514,
2631
+ "mean_token_accuracy": 0.6997211873531342,
2632
+ "num_tokens": 3117440.0,
2633
+ "step": 2770
2634
+ },
2635
+ {
2636
+ "epoch": 2.471111111111111,
2637
+ "grad_norm": 1.89506196975708,
2638
+ "learning_rate": 3.651908396946565e-05,
2639
+ "loss": 1.6102,
2640
+ "mean_token_accuracy": 0.6865462198853493,
2641
+ "num_tokens": 3128685.0,
2642
+ "step": 2780
2643
+ },
2644
+ {
2645
+ "epoch": 2.48,
2646
+ "grad_norm": 2.1102652549743652,
2647
+ "learning_rate": 3.590839694656489e-05,
2648
+ "loss": 1.6092,
2649
+ "mean_token_accuracy": 0.6845578849315643,
2650
+ "num_tokens": 3140574.0,
2651
+ "step": 2790
2652
+ },
2653
+ {
2654
+ "epoch": 2.488888888888889,
2655
+ "grad_norm": 1.9541523456573486,
2656
+ "learning_rate": 3.5297709923664124e-05,
2657
+ "loss": 1.6245,
2658
+ "mean_token_accuracy": 0.6867643877863884,
2659
+ "num_tokens": 3151937.0,
2660
+ "step": 2800
2661
+ },
2662
+ {
2663
+ "epoch": 2.488888888888889,
2664
+ "eval_loss": 1.8869248628616333,
2665
+ "eval_mean_token_accuracy": 0.6508636207580566,
2666
+ "eval_num_tokens": 3151937.0,
2667
+ "eval_runtime": 46.9872,
2668
+ "eval_samples_per_second": 21.282,
2669
+ "eval_steps_per_second": 10.641,
2670
+ "step": 2800
2671
+ },
2672
+ {
2673
+ "epoch": 2.497777777777778,
2674
+ "grad_norm": 2.006448984146118,
2675
+ "learning_rate": 3.468702290076336e-05,
2676
+ "loss": 1.6458,
2677
+ "mean_token_accuracy": 0.6835160732269288,
2678
+ "num_tokens": 3163343.0,
2679
+ "step": 2810
2680
+ },
2681
+ {
2682
+ "epoch": 2.506666666666667,
2683
+ "grad_norm": 2.0644562244415283,
2684
+ "learning_rate": 3.4076335877862595e-05,
2685
+ "loss": 1.5841,
2686
+ "mean_token_accuracy": 0.699130979180336,
2687
+ "num_tokens": 3174278.0,
2688
+ "step": 2820
2689
+ },
2690
+ {
2691
+ "epoch": 2.5155555555555553,
2692
+ "grad_norm": 2.5352766513824463,
2693
+ "learning_rate": 3.346564885496183e-05,
2694
+ "loss": 1.6411,
2695
+ "mean_token_accuracy": 0.687686163187027,
2696
+ "num_tokens": 3185529.0,
2697
+ "step": 2830
2698
+ },
2699
+ {
2700
+ "epoch": 2.5244444444444447,
2701
+ "grad_norm": 2.2506706714630127,
2702
+ "learning_rate": 3.2854961832061074e-05,
2703
+ "loss": 1.5334,
2704
+ "mean_token_accuracy": 0.7042266175150871,
2705
+ "num_tokens": 3196422.0,
2706
+ "step": 2840
2707
+ },
2708
+ {
2709
+ "epoch": 2.533333333333333,
2710
+ "grad_norm": 2.038456439971924,
2711
+ "learning_rate": 3.224427480916031e-05,
2712
+ "loss": 1.5226,
2713
+ "mean_token_accuracy": 0.7002356797456741,
2714
+ "num_tokens": 3207640.0,
2715
+ "step": 2850
2716
+ },
2717
+ {
2718
+ "epoch": 2.542222222222222,
2719
+ "grad_norm": 2.0818448066711426,
2720
+ "learning_rate": 3.1633587786259545e-05,
2721
+ "loss": 1.5136,
2722
+ "mean_token_accuracy": 0.7040936380624772,
2723
+ "num_tokens": 3218742.0,
2724
+ "step": 2860
2725
+ },
2726
+ {
2727
+ "epoch": 2.551111111111111,
2728
+ "grad_norm": 1.9810820817947388,
2729
+ "learning_rate": 3.102290076335878e-05,
2730
+ "loss": 1.6515,
2731
+ "mean_token_accuracy": 0.6826088905334473,
2732
+ "num_tokens": 3230062.0,
2733
+ "step": 2870
2734
+ },
2735
+ {
2736
+ "epoch": 2.56,
2737
+ "grad_norm": 2.1830689907073975,
2738
+ "learning_rate": 3.0412213740458017e-05,
2739
+ "loss": 1.5792,
2740
+ "mean_token_accuracy": 0.699496129155159,
2741
+ "num_tokens": 3240533.0,
2742
+ "step": 2880
2743
+ },
2744
+ {
2745
+ "epoch": 2.568888888888889,
2746
+ "grad_norm": 2.101184368133545,
2747
+ "learning_rate": 2.9801526717557253e-05,
2748
+ "loss": 1.6538,
2749
+ "mean_token_accuracy": 0.6724523141980171,
2750
+ "num_tokens": 3252476.0,
2751
+ "step": 2890
2752
+ },
2753
+ {
2754
+ "epoch": 2.5777777777777775,
2755
+ "grad_norm": 2.021524429321289,
2756
+ "learning_rate": 2.9190839694656492e-05,
2757
+ "loss": 1.6146,
2758
+ "mean_token_accuracy": 0.6886414483189582,
2759
+ "num_tokens": 3263799.0,
2760
+ "step": 2900
2761
+ },
2762
+ {
2763
+ "epoch": 2.586666666666667,
2764
+ "grad_norm": 1.9668735265731812,
2765
+ "learning_rate": 2.8580152671755728e-05,
2766
+ "loss": 1.6477,
2767
+ "mean_token_accuracy": 0.678925508260727,
2768
+ "num_tokens": 3275511.0,
2769
+ "step": 2910
2770
+ },
2771
+ {
2772
+ "epoch": 2.5955555555555554,
2773
+ "grad_norm": 2.088491201400757,
2774
+ "learning_rate": 2.7969465648854964e-05,
2775
+ "loss": 1.6265,
2776
+ "mean_token_accuracy": 0.6857595339417457,
2777
+ "num_tokens": 3286752.0,
2778
+ "step": 2920
2779
+ },
2780
+ {
2781
+ "epoch": 2.6044444444444443,
2782
+ "grad_norm": 2.0536880493164062,
2783
+ "learning_rate": 2.73587786259542e-05,
2784
+ "loss": 1.66,
2785
+ "mean_token_accuracy": 0.681273227930069,
2786
+ "num_tokens": 3297945.0,
2787
+ "step": 2930
2788
+ },
2789
+ {
2790
+ "epoch": 2.6133333333333333,
2791
+ "grad_norm": 2.0063817501068115,
2792
+ "learning_rate": 2.674809160305344e-05,
2793
+ "loss": 1.5102,
2794
+ "mean_token_accuracy": 0.7025244757533073,
2795
+ "num_tokens": 3309112.0,
2796
+ "step": 2940
2797
+ },
2798
+ {
2799
+ "epoch": 2.6222222222222222,
2800
+ "grad_norm": 1.9980206489562988,
2801
+ "learning_rate": 2.6137404580152675e-05,
2802
+ "loss": 1.5142,
2803
+ "mean_token_accuracy": 0.7049572348594666,
2804
+ "num_tokens": 3320544.0,
2805
+ "step": 2950
2806
+ },
2807
+ {
2808
+ "epoch": 2.631111111111111,
2809
+ "grad_norm": 2.1506435871124268,
2810
+ "learning_rate": 2.552671755725191e-05,
2811
+ "loss": 1.5826,
2812
+ "mean_token_accuracy": 0.694467018544674,
2813
+ "num_tokens": 3331309.0,
2814
+ "step": 2960
2815
+ },
2816
+ {
2817
+ "epoch": 2.64,
2818
+ "grad_norm": 1.9890793561935425,
2819
+ "learning_rate": 2.4916030534351147e-05,
2820
+ "loss": 1.5631,
2821
+ "mean_token_accuracy": 0.6945617944002151,
2822
+ "num_tokens": 3343068.0,
2823
+ "step": 2970
2824
+ },
2825
+ {
2826
+ "epoch": 2.648888888888889,
2827
+ "grad_norm": 2.1102676391601562,
2828
+ "learning_rate": 2.4305343511450383e-05,
2829
+ "loss": 1.6145,
2830
+ "mean_token_accuracy": 0.6866093754768372,
2831
+ "num_tokens": 3354691.0,
2832
+ "step": 2980
2833
+ },
2834
+ {
2835
+ "epoch": 2.6577777777777776,
2836
+ "grad_norm": 2.2881674766540527,
2837
+ "learning_rate": 2.369465648854962e-05,
2838
+ "loss": 1.5796,
2839
+ "mean_token_accuracy": 0.6961612686514854,
2840
+ "num_tokens": 3365512.0,
2841
+ "step": 2990
2842
+ },
2843
+ {
2844
+ "epoch": 2.6666666666666665,
2845
+ "grad_norm": 1.973838210105896,
2846
+ "learning_rate": 2.3083969465648854e-05,
2847
+ "loss": 1.5456,
2848
+ "mean_token_accuracy": 0.703473174571991,
2849
+ "num_tokens": 3376406.0,
2850
+ "step": 3000
2851
+ },
2852
+ {
2853
+ "epoch": 2.6666666666666665,
2854
+ "eval_loss": 1.881131649017334,
2855
+ "eval_mean_token_accuracy": 0.6518214672803879,
2856
+ "eval_num_tokens": 3376406.0,
2857
+ "eval_runtime": 47.794,
2858
+ "eval_samples_per_second": 20.923,
2859
+ "eval_steps_per_second": 10.462,
2860
+ "step": 3000
2861
+ },
2862
+ {
2863
+ "epoch": 2.6755555555555555,
2864
+ "grad_norm": 1.9779133796691895,
2865
+ "learning_rate": 2.2473282442748094e-05,
2866
+ "loss": 1.6538,
2867
+ "mean_token_accuracy": 0.6778925880789757,
2868
+ "num_tokens": 3388024.0,
2869
+ "step": 3010
2870
+ },
2871
+ {
2872
+ "epoch": 2.6844444444444444,
2873
+ "grad_norm": 1.848136305809021,
2874
+ "learning_rate": 2.186259541984733e-05,
2875
+ "loss": 1.5608,
2876
+ "mean_token_accuracy": 0.6985713213682174,
2877
+ "num_tokens": 3399547.0,
2878
+ "step": 3020
2879
+ },
2880
+ {
2881
+ "epoch": 2.6933333333333334,
2882
+ "grad_norm": 2.101651191711426,
2883
+ "learning_rate": 2.1251908396946565e-05,
2884
+ "loss": 1.5501,
2885
+ "mean_token_accuracy": 0.6979974433779716,
2886
+ "num_tokens": 3410179.0,
2887
+ "step": 3030
2888
+ },
2889
+ {
2890
+ "epoch": 2.7022222222222223,
2891
+ "grad_norm": 1.8398933410644531,
2892
+ "learning_rate": 2.06412213740458e-05,
2893
+ "loss": 1.5843,
2894
+ "mean_token_accuracy": 0.6883544474840164,
2895
+ "num_tokens": 3421454.0,
2896
+ "step": 3040
2897
+ },
2898
+ {
2899
+ "epoch": 2.7111111111111112,
2900
+ "grad_norm": 2.011132001876831,
2901
+ "learning_rate": 2.003053435114504e-05,
2902
+ "loss": 1.6012,
2903
+ "mean_token_accuracy": 0.6917843446135521,
2904
+ "num_tokens": 3432951.0,
2905
+ "step": 3050
2906
+ },
2907
+ {
2908
+ "epoch": 2.7199999999999998,
2909
+ "grad_norm": 2.005140542984009,
2910
+ "learning_rate": 1.9419847328244276e-05,
2911
+ "loss": 1.5421,
2912
+ "mean_token_accuracy": 0.6976893007755279,
2913
+ "num_tokens": 3444007.0,
2914
+ "step": 3060
2915
+ },
2916
+ {
2917
+ "epoch": 2.728888888888889,
2918
+ "grad_norm": 2.146664619445801,
2919
+ "learning_rate": 1.8809160305343512e-05,
2920
+ "loss": 1.5799,
2921
+ "mean_token_accuracy": 0.6956974431872368,
2922
+ "num_tokens": 3455510.0,
2923
+ "step": 3070
2924
+ },
2925
+ {
2926
+ "epoch": 2.7377777777777776,
2927
+ "grad_norm": 2.0788283348083496,
2928
+ "learning_rate": 1.8198473282442748e-05,
2929
+ "loss": 1.6043,
2930
+ "mean_token_accuracy": 0.6913327068090439,
2931
+ "num_tokens": 3466684.0,
2932
+ "step": 3080
2933
+ },
2934
+ {
2935
+ "epoch": 2.7466666666666666,
2936
+ "grad_norm": 1.8829123973846436,
2937
+ "learning_rate": 1.7587786259541984e-05,
2938
+ "loss": 1.5649,
2939
+ "mean_token_accuracy": 0.6947105377912521,
2940
+ "num_tokens": 3477804.0,
2941
+ "step": 3090
2942
+ },
2943
+ {
2944
+ "epoch": 2.7555555555555555,
2945
+ "grad_norm": 1.9475817680358887,
2946
+ "learning_rate": 1.6977099236641223e-05,
2947
+ "loss": 1.5568,
2948
+ "mean_token_accuracy": 0.7034636497497558,
2949
+ "num_tokens": 3488846.0,
2950
+ "step": 3100
2951
+ },
2952
+ {
2953
+ "epoch": 2.7644444444444445,
2954
+ "grad_norm": 2.098478317260742,
2955
+ "learning_rate": 1.636641221374046e-05,
2956
+ "loss": 1.5575,
2957
+ "mean_token_accuracy": 0.7053634539246559,
2958
+ "num_tokens": 3499405.0,
2959
+ "step": 3110
2960
+ },
2961
+ {
2962
+ "epoch": 2.7733333333333334,
2963
+ "grad_norm": 2.041572093963623,
2964
+ "learning_rate": 1.5755725190839695e-05,
2965
+ "loss": 1.619,
2966
+ "mean_token_accuracy": 0.6887963160872459,
2967
+ "num_tokens": 3511004.0,
2968
+ "step": 3120
2969
+ },
2970
+ {
2971
+ "epoch": 2.7822222222222224,
2972
+ "grad_norm": 2.0892608165740967,
2973
+ "learning_rate": 1.5145038167938933e-05,
2974
+ "loss": 1.55,
2975
+ "mean_token_accuracy": 0.6963776037096977,
2976
+ "num_tokens": 3521755.0,
2977
+ "step": 3130
2978
+ },
2979
+ {
2980
+ "epoch": 2.7911111111111113,
2981
+ "grad_norm": 1.9754984378814697,
2982
+ "learning_rate": 1.4534351145038168e-05,
2983
+ "loss": 1.5459,
2984
+ "mean_token_accuracy": 0.7077917411923409,
2985
+ "num_tokens": 3532621.0,
2986
+ "step": 3140
2987
+ },
2988
+ {
2989
+ "epoch": 2.8,
2990
+ "grad_norm": 1.9490447044372559,
2991
+ "learning_rate": 1.3923664122137406e-05,
2992
+ "loss": 1.6047,
2993
+ "mean_token_accuracy": 0.6932125955820083,
2994
+ "num_tokens": 3543418.0,
2995
+ "step": 3150
2996
+ },
2997
+ {
2998
+ "epoch": 2.8088888888888888,
2999
+ "grad_norm": 2.12741756439209,
3000
+ "learning_rate": 1.3312977099236642e-05,
3001
+ "loss": 1.6336,
3002
+ "mean_token_accuracy": 0.6868860185146332,
3003
+ "num_tokens": 3555172.0,
3004
+ "step": 3160
3005
+ },
3006
+ {
3007
+ "epoch": 2.8177777777777777,
3008
+ "grad_norm": 1.9473916292190552,
3009
+ "learning_rate": 1.270229007633588e-05,
3010
+ "loss": 1.5765,
3011
+ "mean_token_accuracy": 0.696508777141571,
3012
+ "num_tokens": 3565975.0,
3013
+ "step": 3170
3014
+ },
3015
+ {
3016
+ "epoch": 2.8266666666666667,
3017
+ "grad_norm": 2.065030336380005,
3018
+ "learning_rate": 1.2091603053435115e-05,
3019
+ "loss": 1.6127,
3020
+ "mean_token_accuracy": 0.6915735498070716,
3021
+ "num_tokens": 3578154.0,
3022
+ "step": 3180
3023
+ },
3024
+ {
3025
+ "epoch": 2.8355555555555556,
3026
+ "grad_norm": 2.1202714443206787,
3027
+ "learning_rate": 1.1480916030534351e-05,
3028
+ "loss": 1.5786,
3029
+ "mean_token_accuracy": 0.702069939672947,
3030
+ "num_tokens": 3589470.0,
3031
+ "step": 3190
3032
+ },
3033
+ {
3034
+ "epoch": 2.8444444444444446,
3035
+ "grad_norm": 2.081028699874878,
3036
+ "learning_rate": 1.0870229007633589e-05,
3037
+ "loss": 1.6146,
3038
+ "mean_token_accuracy": 0.6874286815524101,
3039
+ "num_tokens": 3600489.0,
3040
+ "step": 3200
3041
+ },
3042
+ {
3043
+ "epoch": 2.8444444444444446,
3044
+ "eval_loss": 1.8764336109161377,
3045
+ "eval_mean_token_accuracy": 0.6531118412017822,
3046
+ "eval_num_tokens": 3600489.0,
3047
+ "eval_runtime": 47.0874,
3048
+ "eval_samples_per_second": 21.237,
3049
+ "eval_steps_per_second": 10.619,
3050
+ "step": 3200
3051
+ },
3052
+ {
3053
+ "epoch": 2.8533333333333335,
3054
+ "grad_norm": 2.002845048904419,
3055
+ "learning_rate": 1.0259541984732825e-05,
3056
+ "loss": 1.5998,
3057
+ "mean_token_accuracy": 0.6930819883942604,
3058
+ "num_tokens": 3611438.0,
3059
+ "step": 3210
3060
+ },
3061
+ {
3062
+ "epoch": 2.862222222222222,
3063
+ "grad_norm": 1.967205286026001,
3064
+ "learning_rate": 9.648854961832062e-06,
3065
+ "loss": 1.5121,
3066
+ "mean_token_accuracy": 0.7083013087511063,
3067
+ "num_tokens": 3622326.0,
3068
+ "step": 3220
3069
+ },
3070
+ {
3071
+ "epoch": 2.871111111111111,
3072
+ "grad_norm": 1.9093670845031738,
3073
+ "learning_rate": 9.038167938931298e-06,
3074
+ "loss": 1.5712,
3075
+ "mean_token_accuracy": 0.6914584785699844,
3076
+ "num_tokens": 3633132.0,
3077
+ "step": 3230
3078
+ },
3079
+ {
3080
+ "epoch": 2.88,
3081
+ "grad_norm": 2.0666589736938477,
3082
+ "learning_rate": 8.427480916030536e-06,
3083
+ "loss": 1.6203,
3084
+ "mean_token_accuracy": 0.6882967233657837,
3085
+ "num_tokens": 3644569.0,
3086
+ "step": 3240
3087
+ },
3088
+ {
3089
+ "epoch": 2.888888888888889,
3090
+ "grad_norm": 2.0188019275665283,
3091
+ "learning_rate": 7.816793893129771e-06,
3092
+ "loss": 1.5133,
3093
+ "mean_token_accuracy": 0.7055545896291733,
3094
+ "num_tokens": 3655650.0,
3095
+ "step": 3250
3096
+ },
3097
+ {
3098
+ "epoch": 2.897777777777778,
3099
+ "grad_norm": 1.9436832666397095,
3100
+ "learning_rate": 7.206106870229008e-06,
3101
+ "loss": 1.5754,
3102
+ "mean_token_accuracy": 0.6883370772004127,
3103
+ "num_tokens": 3667338.0,
3104
+ "step": 3260
3105
+ },
3106
+ {
3107
+ "epoch": 2.9066666666666667,
3108
+ "grad_norm": 1.960017442703247,
3109
+ "learning_rate": 6.595419847328245e-06,
3110
+ "loss": 1.6513,
3111
+ "mean_token_accuracy": 0.6853567749261856,
3112
+ "num_tokens": 3678543.0,
3113
+ "step": 3270
3114
+ },
3115
+ {
3116
+ "epoch": 2.9155555555555557,
3117
+ "grad_norm": 1.8537602424621582,
3118
+ "learning_rate": 5.984732824427481e-06,
3119
+ "loss": 1.6711,
3120
+ "mean_token_accuracy": 0.6820794567465782,
3121
+ "num_tokens": 3690401.0,
3122
+ "step": 3280
3123
+ },
3124
+ {
3125
+ "epoch": 2.924444444444444,
3126
+ "grad_norm": 1.9544005393981934,
3127
+ "learning_rate": 5.3740458015267174e-06,
3128
+ "loss": 1.5983,
3129
+ "mean_token_accuracy": 0.6836786240339279,
3130
+ "num_tokens": 3702001.0,
3131
+ "step": 3290
3132
+ },
3133
+ {
3134
+ "epoch": 2.9333333333333336,
3135
+ "grad_norm": 2.035642147064209,
3136
+ "learning_rate": 4.763358778625954e-06,
3137
+ "loss": 1.6383,
3138
+ "mean_token_accuracy": 0.6839755535125732,
3139
+ "num_tokens": 3713114.0,
3140
+ "step": 3300
3141
+ },
3142
+ {
3143
+ "epoch": 2.942222222222222,
3144
+ "grad_norm": 1.863014578819275,
3145
+ "learning_rate": 4.152671755725191e-06,
3146
+ "loss": 1.6363,
3147
+ "mean_token_accuracy": 0.6912973523139954,
3148
+ "num_tokens": 3724432.0,
3149
+ "step": 3310
3150
+ },
3151
+ {
3152
+ "epoch": 2.951111111111111,
3153
+ "grad_norm": 2.1031157970428467,
3154
+ "learning_rate": 3.541984732824428e-06,
3155
+ "loss": 1.6434,
3156
+ "mean_token_accuracy": 0.6855254426598549,
3157
+ "num_tokens": 3735522.0,
3158
+ "step": 3320
3159
+ },
3160
+ {
3161
+ "epoch": 2.96,
3162
+ "grad_norm": 1.9777454137802124,
3163
+ "learning_rate": 2.9312977099236643e-06,
3164
+ "loss": 1.6118,
3165
+ "mean_token_accuracy": 0.6878430411219597,
3166
+ "num_tokens": 3746892.0,
3167
+ "step": 3330
3168
+ },
3169
+ {
3170
+ "epoch": 2.968888888888889,
3171
+ "grad_norm": 1.947704553604126,
3172
+ "learning_rate": 2.320610687022901e-06,
3173
+ "loss": 1.6083,
3174
+ "mean_token_accuracy": 0.6893500313162804,
3175
+ "num_tokens": 3757984.0,
3176
+ "step": 3340
3177
+ },
3178
+ {
3179
+ "epoch": 2.977777777777778,
3180
+ "grad_norm": 2.1452696323394775,
3181
+ "learning_rate": 1.7099236641221375e-06,
3182
+ "loss": 1.5432,
3183
+ "mean_token_accuracy": 0.699726614356041,
3184
+ "num_tokens": 3768945.0,
3185
+ "step": 3350
3186
+ },
3187
+ {
3188
+ "epoch": 2.986666666666667,
3189
+ "grad_norm": 1.9867252111434937,
3190
+ "learning_rate": 1.099236641221374e-06,
3191
+ "loss": 1.6106,
3192
+ "mean_token_accuracy": 0.6885504856705665,
3193
+ "num_tokens": 3780272.0,
3194
+ "step": 3360
3195
+ },
3196
+ {
3197
+ "epoch": 2.9955555555555557,
3198
+ "grad_norm": 2.084091901779175,
3199
+ "learning_rate": 4.885496183206107e-07,
3200
+ "loss": 1.6304,
3201
+ "mean_token_accuracy": 0.6875804170966149,
3202
+ "num_tokens": 3791870.0,
3203
+ "step": 3370
3204
+ }
3205
+ ],
3206
+ "logging_steps": 10,
3207
+ "max_steps": 3375,
3208
+ "num_input_tokens_seen": 0,
3209
+ "num_train_epochs": 3,
3210
+ "save_steps": 200,
3211
+ "stateful_callbacks": {
3212
+ "TrainerControl": {
3213
+ "args": {
3214
+ "should_epoch_stop": false,
3215
+ "should_evaluate": false,
3216
+ "should_log": false,
3217
+ "should_save": true,
3218
+ "should_training_stop": true
3219
+ },
3220
+ "attributes": {}
3221
+ }
3222
+ },
3223
+ "total_flos": 1.2042984005197824e+16,
3224
+ "train_batch_size": 2,
3225
+ "trial_name": null,
3226
+ "trial_params": null
3227
+ }
checkpoint-3375/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef7dcfbb520eaa72697f1d45b91e159189ecefb58152a35a5fa5a95eb7d53aa9
3
+ size 5624
checkpoint-3375/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,501 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "clean_up_tokenization_spaces": true,
495
+ "eos_token": "<|endofturn|>",
496
+ "extra_special_tokens": {},
497
+ "model_max_length": 1000000000000000019884624838656,
498
+ "pad_token": "<|endoftext|>",
499
+ "tokenizer_class": "GPT2Tokenizer",
500
+ "unk_token": "<|endoftext|>"
501
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef7dcfbb520eaa72697f1d45b91e159189ecefb58152a35a5fa5a95eb7d53aa9
3
+ size 5624
vocab.json ADDED
The diff for this file is too large to render. See raw diff