---
license: mit
language:
- en
base_model:
- Qwen/Qwen3-30B-A3B-Thinking-2507
---
# PromptCoT-2.0-SelfPlay-30B-A3B

This model is part of **PromptCoT 2.0** (*Scaling Prompt Synthesis for LLM Reasoning*).  
It is a **30B-A3B model trained via self-play**, where synthesized problems from PromptCoT 2.0 provide **verifiable feedback** (unit tests for code, boxed answers for math).  
The training loop uses **Direct Preference Optimization (DPO)** to align generations with automatically verified outcomes, removing the dependence on stronger external teachers.

This model achieves **state-of-the-art performance at the 30B scale**, competitive with closed-source models such as Gemini 2.5 Pro and OpenAI o3.

---

## ✨ Highlights

- **Self-Play Training**:  
  The model improves autonomously using **synthetic math & code problems** generated by PromptCoT 2.0.  
  Positive/negative pairs are constructed from verifiable feedback signals (unit test success / final answer correctness).  

- **Competitive with Closed-Source Models**:  
  Despite activating only **3B parameters**, this model achieves results comparable to **Gemini 2.5 Pro** and **OpenAI o3** across both math and code.  

---

## 📊 Results

<p align="center">
  <img src="assets/selfplay_30b_results.png" width="95%" alt="PromptCoT-2.0 Self-Play 30B-A3B Results"/>
</p>
<p align="center">
  <em>
  Performance of <b>PromptCoT-2.0-SelfPlay-30B-A3B</b> on six benchmarks (AIME24/25, HMMT Feb25, LiveCodeBench v5/v6, Codeforces).  
  The model achieves competitive results with Gemini 2.5 Pro and OpenAI o3, while surpassing strong open-source baselines.
  </em>
</p>

---

## 🔮 Key Takeaways

- **Math + Code reasoning**: Strong, balanced gains across both **Olympiad-level math** (AIME, HMMT) and **competitive programming** (LiveCodeBench, Codeforces).  
- **Efficient scaling**: Uses **3B activated parameters** for self-play fine-tuning, making it significantly more efficient than comparable closed-source models.  

---

## 📂 Resources

- 📄 Paper: [PromptCoT 2.0](https://arxiv.org/abs/2509.19894)  
- 💻 GitHub: [inclusionAI/PromptCoT](https://github.com/inclusionAI/PromptCoT)  
- 📊 Dataset: [PromptCoT-2.0-SelfPlay-30B-11K](https://huggingface.co/datasets/xl-zhao/PromptCoT-2.0-SelfPlay-30B-11K)  

---

## 📜 Citation

If you find this model useful, please consider citing:

````bibtex
@article{zhao2025promptcot2,
  title     = {PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
  journal   = {arXiv preprint arXiv:2509.19894},
  year      = {2025},
  url       = {https://arxiv.org/abs/2509.19894}
}
````