Post
261
LongWriter-Zero š„ A Purely RL trained LLM handles 10K+ token coherent passages by Tsinghua University
Model:
THU-KEG/LongWriter-Zero-32B
Dataset:
THU-KEG/LongWriter-Zero-RLData
Paper:
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning (2506.18841)
⨠32B
⨠Multi-reward GRPO: length, fluency, structure, non-redundancy
⨠Enforces <think><answer> format via Format RM
⨠Build on Qwen2.5-32B-base
Model:
THU-KEG/LongWriter-Zero-32B
Dataset:
THU-KEG/LongWriter-Zero-RLData
Paper:
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning (2506.18841)
⨠32B
⨠Multi-reward GRPO: length, fluency, structure, non-redundancy
⨠Enforces <think><answer> format via Format RM
⨠Build on Qwen2.5-32B-base