ZYH-LLM-Qwen2.5-14B-V4

upgraded version： The fifth-generation model of ZYH-LLM-Qwen2.5 has been released!

Increase the proportion of the R1 distillation model in the model merging recipe while maintaining the model's instruction-following ability and general capabilities.

Merge Template

merge_method: model_stock  
base_model: Instruction Model  
models:  
  - model: Instruction Fine-tuning Model 1  
  - model: Instruction Fine-tuning Model 2  
  - model: Inference Fine-tuning Model 1  
  - model: Inference Fine-tuning Model 2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true

Using the above template for merging can improve the calculation accuracy and inference ability of the model without reducing the general capabilities of the instruction model.

ZYH-LLM-Qwen2.5-V4 used this template during the model merging process.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	43.14
IFEval (0-Shot)	83.65
BBH (3-Shot)	50.27
MATH Lvl 5 (4-Shot)	53.93
GPQA (0-shot)	8.61
MuSR (0-shot)	15.66
MMLU-PRO (5-shot)	46.71

First stage:

Create four different instruction models and code model

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-base

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/Virtuoso-Small-v2  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-v2

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/SuperNova-Medius  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-Nova

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Azure99/Blossom-V6-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-V6

models:  
  - model: Qwen/Qwen2.5-Coder-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-Coder-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-Coder-14B-della

Second stage:

Step 1:

Create three instruction models with a bias towards reasoning by using templates.

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-Coder-14B-della  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Coder

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-V6  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-V6

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Nova

Step 2:

Create a pure instruction model to restore the generality of the final model.

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: Qwen2.5-14B-della-V6   
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-it

Third stage:

Create a base model with a context of 1 million tokens.

merge_method: sce  
models:
  # Pivot model
  - model: Qwen/Qwen2.5-14B-Instruct-1M
  # Target models  
  - model: Qwen/Qwen2.5-14B  
base_model: Qwen/Qwen2.5-14B-Instruct-1M  
parameters:  
  select_topk: 1  
dtype: bfloat16  
tokenizer_source: base  
normalize: true  
int8_mask: true  
name: Qwen2.5-14B-1M

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen2.5-14B-1M  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-1M

Final stage:

merge_method: model_stock  
base_model: Qwen2.5-14B-della-1M  
models:  
  - model: Qwen2.5-14B-mst-Coder  
  - model: Qwen2.5-14B-mst-V6  
  - model: Qwen2.5-14B-mst-Nova  
  - model: Qwen2.5-14B-mst-it  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: ZYH-LLM-Qwen2.5-14B-V4

YOYO-AI
/

ZYH-LLM-Qwen2.5-14B-V4