nthehai01 lbourdois commited on
Commit
7dcef93
·
verified ·
1 Parent(s): 9677de6

Improve language tag (#1)

Browse files

- Improve language tag (9307bfc173f4589e1822beded86b2d5fd05a5b46)


Co-authored-by: Loïck BOURDOIS <lbourdois@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +70 -57
README.md CHANGED
@@ -1,57 +1,70 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-Math-7B
4
- - Qwen/Qwen2.5-7B-Instruct
5
- - Qwen/Qwen2.5-7B
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
-
11
- ---
12
- # Qwen2.5-7B-Instruct-Math-task-arithmetic
13
-
14
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
15
-
16
- ## Performance
17
- | Metric |Value|
18
- |---------------------------------|----:|
19
- |GSM8k (zero-shot) |91.35|
20
- |HellaSwag (zero-Shot) |80.01|
21
- |MBPP (zero-shot) |61.01|
22
-
23
- ## Merge Details
24
- ### Merge Method
25
-
26
- This model was merged using the [Task Arithmetic](https://arxiv.org/abs/2212.04089) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
27
-
28
- ### Models Merged
29
-
30
- The following models were included in the merge:
31
- * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
32
- * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
33
-
34
- ### Configuration
35
-
36
- The following YAML configuration was used to produce this model:
37
-
38
- ```yaml
39
- base_model: Qwen/Qwen2.5-7B
40
- dtype: bfloat16
41
- merge_method: task_arithmetic
42
- parameters:
43
- lambda: 0.7870041304118442
44
- normalize: 1.0
45
- slices:
46
- - sources:
47
- - layer_range: [0, 28]
48
- model: Qwen/Qwen2.5-7B
49
- - layer_range: [0, 28]
50
- model: Qwen/Qwen2.5-Math-7B
51
- parameters:
52
- weight: 0.11841208483160265
53
- - layer_range: [0, 28]
54
- model: Qwen/Qwen2.5-7B-Instruct
55
- parameters:
56
- weight: 0.7783861791140264
57
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-Math-7B
4
+ - Qwen/Qwen2.5-7B-Instruct
5
+ - Qwen/Qwen2.5-7B
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+ # Qwen2.5-7B-Instruct-Math-task-arithmetic
26
+
27
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
28
+
29
+ ## Performance
30
+ | Metric |Value|
31
+ |---------------------------------|----:|
32
+ |GSM8k (zero-shot) |91.35|
33
+ |HellaSwag (zero-Shot) |80.01|
34
+ |MBPP (zero-shot) |61.01|
35
+
36
+ ## Merge Details
37
+ ### Merge Method
38
+
39
+ This model was merged using the [Task Arithmetic](https://arxiv.org/abs/2212.04089) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
40
+
41
+ ### Models Merged
42
+
43
+ The following models were included in the merge:
44
+ * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
45
+ * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
46
+
47
+ ### Configuration
48
+
49
+ The following YAML configuration was used to produce this model:
50
+
51
+ ```yaml
52
+ base_model: Qwen/Qwen2.5-7B
53
+ dtype: bfloat16
54
+ merge_method: task_arithmetic
55
+ parameters:
56
+ lambda: 0.7870041304118442
57
+ normalize: 1.0
58
+ slices:
59
+ - sources:
60
+ - layer_range: [0, 28]
61
+ model: Qwen/Qwen2.5-7B
62
+ - layer_range: [0, 28]
63
+ model: Qwen/Qwen2.5-Math-7B
64
+ parameters:
65
+ weight: 0.11841208483160265
66
+ - layer_range: [0, 28]
67
+ model: Qwen/Qwen2.5-7B-Instruct
68
+ parameters:
69
+ weight: 0.7783861791140264
70
+ ```