Instructions to use juspay/xyne-rl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use juspay/xyne-rl with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("juspay/xyne-rl", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Xyne RL checkpoints
Merged Gemma 4 31B checkpoints from the Xyne near-online GRPO/RLAIF run.
Folder layout
Full merged checkpoints:
- v1/checkpoint-200: retained/evaluated checkpoint 200
- v1/checkpoint-304: retained/evaluated checkpoint 304. There is no retained local checkpoint 302; 304 is the evaluated nearest checkpoint.
- v1/checkpoint-328-latest: latest merged checkpoint available after the run was stopped
Adapter-only artifacts available locally:
- v1/adapters/adapter-200: LoRA adapter that was merged into checkpoint 200
- v1/adapters/adapter-312-recovery: LoRA adapter for the recovered 304 -> 312 chunk
The exact adapter-only deltas for checkpoint 304 and checkpoint 328 were not retained locally after merge/pruning, so they are not uploaded as adapters. Their full merged checkpoints are available above.
Eval results
Standalone Xyne v2 eval on 31 held-out questions, 1 rollout per question, using the same adaptive 3 -> 7 judge path as training:
| Model | All-in mean | Clean judged mean | Median | Valid judged | Judge dropped | Notes |
|---|---|---|---|---|---|---|
| Base Gemma 4 31B | 0.5284 | ~0.546 | 0.50 | 31/31 rollouts | 1 | one judge parse dropout counted as zero in all-in score |
| checkpoint-200 | 0.5806 | ~0.621 | 0.60 | 31/31 rollouts | 2 | two judge API timeout dropouts counted as zero in all-in score |
| checkpoint-304 | 0.6097 | 0.610 | 0.60 | 31/31 rollouts | 0 | best all-in operational score among evaluated checkpoints |
Interpretation:
- Training improved over base.
- checkpoint-304 has the best all-in operational score because it had no judge dropouts.
- checkpoint-200 is strongest on clean judged-answer quality if judge-infrastructure failures are excluded.
- The eval is directional, not a final statistical ranking, because it used one rollout per question.
These are full merged checkpoints unless under v1/adapters/.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support