yourbench/yourbench-custom-prompts-example-gpt-4.1 Viewer β’ Updated about 14 hours ago β’ 55 β’ 15
yourbench/yourbench-custom-prompts-example-gpt-4.1 Viewer β’ Updated about 14 hours ago β’ 55 β’ 15
yourbench/yourbench-custom-prompts-example-oss-120b Viewer β’ Updated about 14 hours ago β’ 3 β’ 11
yourbench/yourbench-custom-prompts-example-oss-120b Viewer β’ Updated about 14 hours ago β’ 3 β’ 11
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper β’ 2506.20920 β’ Published Jun 26 β’ 64
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper β’ 2506.20920 β’ Published Jun 26 β’ 64
From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents Paper β’ 2410.23555 β’ Published Oct 31, 2024
Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems Paper β’ 2501.17348 β’ Published Jan 28
TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons Paper β’ 2504.19982 β’ Published Apr 28
Language Specific Knowledge: Do Models Know Better in X than in English? Paper β’ 2505.14990 β’ Published May 21 β’ 1
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents Paper β’ 2505.01592 β’ Published May 2
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper β’ 2506.01844 β’ Published Jun 2 β’ 122