PersonalAILab
/

A2FM-32B-rl

Text Generation

text-generation-inference

Model card Files Files and versions

A2FM-32B-rl / README.md

Qianben's picture

Update README.md

c522519 verified about 1 month ago

|

history blame contribute delete

2.18 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	---

	# 🧠 A²FM: Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

	A²FM (Adaptive Agent Foundation Model) unifies reasoning-centric and agentic paradigms into a single framework that adaptively selects among three execution modes — instant, reasoning, and agentic.
	It follows a route-then-align training principle and introduces Adaptive Policy Optimization (APO) to jointly optimize accuracy and efficiency.

	A²FM achieves state-of-the-art performance on major reasoning and agentic benchmarks:
	- 13.4% on BrowseComp (agentic)
	- 70.4% on AIME25 (reasoning)
	- 16.7% on HLE (general)

	Notably, its adaptive execution achieves a cost of pass of only \$0.00487 per correct answer, cutting cost by 45.2% vs. reasoning and 33.5% vs. agentic, delivering substantially higher cost efficiency while maintaining comparable accuracy.

	📄 [Paper](https://arxiv.org/abs/2510.12838)
	💻 [GitHub](https://github.com/OPPO-PersonalAI/Adaptive_Agent_Foundation_Models)

	---

	## 🔑 Key Highlights

	- ⚙️ Unified reasoning & agentic modeling
	Integrates direct reasoning, chain-of-thought, and tool-augmented actions within a single backbone.

	- 🔄 Route-then-Align supervised fine-tuning
	Trains task-aware routing followed by mode-aligned trajectory learning.

	- 🧩 Adaptive Policy Optimization (APO)
	Reinforcement learning with adaptive sampling and cost-regularized reward for efficiency–accuracy balance.

	- 💡 Substantially lower inference cost
	Adaptive routing cuts redundant reasoning/tool use while preserving correctness.

	---

	## 📘 Citation

	```bibtex
	@article{chen2025textsuperscript,
	title={A$\backslash$textsuperscript $\{$2$\}$ FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning},
	author={Chen, Qianben and Cao, Jingyi and Zhang, Jiayu and Qin, Tianrui and Li, Xiaowan and Zhu, King and Shi, Dingfeng and Zhu, He and Liu, Minghao and Liang, Xiaobo and others},
	journal={arXiv preprint arXiv:2510.12838},
	year={2025}
	}