view changelog Changelog Inference Providers now fully support OpenAI-compatible API 19 days ago • 73
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published 29 days ago • 20
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published 29 days ago • 38
view article Article The AI Paradigm Shift Is Here: 4 Disruptive Trends from the Top 50 Hugging Face Papers of Q2 2025 By vansin • Jul 2 • 2
AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation Paper • 2506.00551 • Published May 31 • 3
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination Paper • 2503.04149 • Published Mar 6 • 6
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming Paper • 2505.12925 • Published May 19 • 2
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 267
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 68
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 280
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges Paper • 2503.06553 • Published Mar 9 • 8
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 233
MinorBench: A hand-built benchmark for content-based risks for children Paper • 2503.10242 • Published Mar 13 • 5
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published Mar 9 • 11
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 68
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation Paper • 2502.08168 • Published Feb 12 • 12
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 61