A2FM-32B-rl / README.md
Qianben's picture
Update README.md
c522519 verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# 🧠 A²FM: Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
**A²FM (Adaptive Agent Foundation Model)** unifies reasoning-centric and agentic paradigms into a single framework that adaptively selects among **three execution modes***instant*, *reasoning*, and *agentic*.
It follows a **route-then-align** training principle and introduces **Adaptive Policy Optimization (APO)** to jointly optimize accuracy and efficiency.
A²FM achieves **state-of-the-art performance** on major reasoning and agentic benchmarks:
- **13.4%** on *BrowseComp* (agentic)
- **70.4%** on *AIME25* (reasoning)
- **16.7%** on *HLE* (general)
Notably, its adaptive execution achieves a **cost of pass of only \$0.00487 per correct answer**, cutting cost by **45.2% vs. reasoning** and **33.5% vs. agentic**, delivering substantially higher cost efficiency while maintaining comparable accuracy.
📄 [**Paper**](https://arxiv.org/abs/2510.12838)
💻 [**GitHub**](https://github.com/OPPO-PersonalAI/Adaptive_Agent_Foundation_Models)
---
## 🔑 Key Highlights
- ⚙️ **Unified reasoning & agentic modeling**
Integrates direct reasoning, chain-of-thought, and tool-augmented actions within a single backbone.
- 🔄 **Route-then-Align supervised fine-tuning**
Trains task-aware routing followed by mode-aligned trajectory learning.
- 🧩 **Adaptive Policy Optimization (APO)**
Reinforcement learning with adaptive sampling and cost-regularized reward for efficiency–accuracy balance.
- 💡 **Substantially lower inference cost**
Adaptive routing cuts redundant reasoning/tool use while preserving correctness.
---
## 📘 Citation
```bibtex
@article{chen2025textsuperscript,
title={A$\backslash$textsuperscript $\{$2$\}$ FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning},
author={Chen, Qianben and Cao, Jingyi and Zhang, Jiayu and Qin, Tianrui and Li, Xiaowan and Zhu, King and Shi, Dingfeng and Zhu, He and Liu, Minghao and Liang, Xiaobo and others},
journal={arXiv preprint arXiv:2510.12838},
year={2025}
}