AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios Paper • 2508.19988 • Published Aug 27
Meta-Reasoning Improves Tool Use in Large Language Models Paper • 2411.04535 • Published Nov 7, 2024 • 1
How can representation dimension dominate structurally pruned LLMs? Paper • 2503.04377 • Published Mar 6
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study Paper • 2504.02733 • Published Apr 3
Reverse Engineering Human Preferences with Reinforcement Learning Paper • 2505.15795 • Published May 21