Xyne RL checkpoints

Merged Gemma 4 31B checkpoints from the Xyne near-online GRPO/RLAIF run.

Folder layout

Full merged checkpoints:

  • v1/checkpoint-200: retained/evaluated checkpoint 200
  • v1/checkpoint-304: retained/evaluated checkpoint 304. There is no retained local checkpoint 302; 304 is the evaluated nearest checkpoint.
  • v1/checkpoint-328-latest: latest merged checkpoint available after the run was stopped

Adapter-only artifacts available locally:

  • v1/adapters/adapter-200: LoRA adapter that was merged into checkpoint 200
  • v1/adapters/adapter-312-recovery: LoRA adapter for the recovered 304 -> 312 chunk

The exact adapter-only deltas for checkpoint 304 and checkpoint 328 were not retained locally after merge/pruning, so they are not uploaded as adapters. Their full merged checkpoints are available above.

Eval results

Standalone Xyne v2 eval on 31 held-out questions, 1 rollout per question, using the same adaptive 3 -> 7 judge path as training:

Model All-in mean Clean judged mean Median Valid judged Judge dropped Notes
Base Gemma 4 31B 0.5284 ~0.546 0.50 31/31 rollouts 1 one judge parse dropout counted as zero in all-in score
checkpoint-200 0.5806 ~0.621 0.60 31/31 rollouts 2 two judge API timeout dropouts counted as zero in all-in score
checkpoint-304 0.6097 0.610 0.60 31/31 rollouts 0 best all-in operational score among evaluated checkpoints

Interpretation:

  • Training improved over base.
  • checkpoint-304 has the best all-in operational score because it had no judge dropouts.
  • checkpoint-200 is strongest on clean judged-answer quality if judge-infrastructure failures are excluded.
  • The eval is directional, not a final statistical ranking, because it used one rollout per question.

These are full merged checkpoints unless under v1/adapters/.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support