--- license: mit language: - en base_model: - Qwen/Qwen3-30B-A3B-Thinking-2507 --- # PromptCoT-2.0-SelfPlay-30B-A3B This model is part of **PromptCoT 2.0** (*Scaling Prompt Synthesis for LLM Reasoning*). It is a **30B-A3B model trained via self-play**, where synthesized problems from PromptCoT 2.0 provide **verifiable feedback** (unit tests for code, boxed answers for math). The training loop uses **Direct Preference Optimization (DPO)** to align generations with automatically verified outcomes, removing the dependence on stronger external teachers. This model achieves **state-of-the-art performance at the 30B scale**, competitive with closed-source models such as Gemini 2.5 Pro and OpenAI o3. --- ## ✨ Highlights - **Self-Play Training**: The model improves autonomously using **synthetic math & code problems** generated by PromptCoT 2.0. Positive/negative pairs are constructed from verifiable feedback signals (unit test success / final answer correctness). - **Competitive with Closed-Source Models**: Despite activating only **3B parameters**, this model achieves results comparable to **Gemini 2.5 Pro** and **OpenAI o3** across both math and code. --- ## 📊 Results
Performance of PromptCoT-2.0-SelfPlay-30B-A3B on six benchmarks (AIME24/25, HMMT Feb25, LiveCodeBench v5/v6, Codeforces). The model achieves competitive results with Gemini 2.5 Pro and OpenAI o3, while surpassing strong open-source baselines.
--- ## 🔮 Key Takeaways - **Math + Code reasoning**: Strong, balanced gains across both **Olympiad-level math** (AIME, HMMT) and **competitive programming** (LiveCodeBench, Codeforces). - **Efficient scaling**: Uses **3B activated parameters** for self-play fine-tuning, making it significantly more efficient than comparable closed-source models. --- ## 📂 Resources - 📄 Paper: [PromptCoT 2.0](https://arxiv.org/abs/2509.19894) - 💻 GitHub: [inclusionAI/PromptCoT](https://github.com/inclusionAI/PromptCoT) - 📊 Dataset: [PromptCoT-2.0-SelfPlay-30B-11K](https://huggingface.co/datasets/xl-zhao/PromptCoT-2.0-SelfPlay-30B-11K) --- ## 📜 Citation If you find this model useful, please consider citing: ````bibtex @article{zhao2025promptcot2, title = {PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning}, author = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng}, journal = {arXiv preprint arXiv:2509.19894}, year = {2025}, url = {https://arxiv.org/abs/2509.19894} } ````