Xyne RL checkpoints

Merged Gemma 4 31B checkpoints from the Xyne near-online GRPO/RLAIF run.

Folder layout

Full merged checkpoints:

v1/checkpoint-200: retained/evaluated checkpoint 200
v1/checkpoint-304: retained/evaluated checkpoint 304. There is no retained local checkpoint 302; 304 is the evaluated nearest checkpoint.
v1/checkpoint-328-latest: latest merged checkpoint available after the run was stopped

Adapter-only artifacts available locally:

v1/adapters/adapter-200: LoRA adapter that was merged into checkpoint 200
v1/adapters/adapter-312-recovery: LoRA adapter for the recovered 304 -> 312 chunk

The exact adapter-only deltas for checkpoint 304 and checkpoint 328 were not retained locally after merge/pruning, so they are not uploaded as adapters. Their full merged checkpoints are available above.

Eval results

Standalone Xyne v2 eval on 31 held-out questions, 1 rollout per question, using the same adaptive 3 -> 7 judge path as training:

Model	All-in mean	Clean judged mean	Median	Valid judged	Judge dropped	Notes
Base Gemma 4 31B	0.5284	~0.546	0.50	31/31 rollouts	1	one judge parse dropout counted as zero in all-in score
checkpoint-200	0.5806	~0.621	0.60	31/31 rollouts	2	two judge API timeout dropouts counted as zero in all-in score
checkpoint-304	0.6097	0.610	0.60	31/31 rollouts	0	best all-in operational score among evaluated checkpoints

Interpretation:

Training improved over base.
checkpoint-304 has the best all-in operational score because it had no judge dropouts.
checkpoint-200 is strongest on clean judged-answer quality if judge-infrastructure failures are excluded.
The eval is directional, not a final statistical ranking, because it used one rollout per question.

These are full merged checkpoints unless under v1/adapters/.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support