AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published about 19 hours ago • 73
InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published 28 days ago • 96
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published 29 days ago • 119
Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective Paper • 2503.18313 • Published Mar 24
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking Paper • 2505.11065 • Published May 16
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 300
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment Paper • 2308.05374 • Published Aug 10, 2023 • 28