Post
399
LongWriter-Zero 🔥 A Purely RL trained LLM handles 10K+ token coherent passages by Tsinghua University
Model:
THU-KEG/LongWriter-Zero-32B
Dataset:
THU-KEG/LongWriter-Zero-RLData
Paper:
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning (2506.18841)
✨ 32B
✨ Multi-reward GRPO: length, fluency, structure, non-redundancy
✨ Enforces <think><answer> format via Format RM
✨ Build on Qwen2.5-32B-base
Model:
THU-KEG/LongWriter-Zero-32B
Dataset:
THU-KEG/LongWriter-Zero-RLData
Paper:
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning (2506.18841)
✨ 32B
✨ Multi-reward GRPO: length, fluency, structure, non-redundancy
✨ Enforces <think><answer> format via Format RM
✨ Build on Qwen2.5-32B-base