Benjamin Feuer's picture

Benjamin Feuer

penfever

·

AI & ML interests

Deep learning, computer vision, large language models, large vision language models

Recent Activity

updated a dataset about 15 hours ago

DCAgent/GPT-5-mini-71-tasks-fix

updated a model about 17 hours ago

DCAgent/dev-set-71-tasks-fixed-nov-7

published a dataset about 17 hours ago

DCAgent/GPT-5-mini-71-tasks-fix

View all activity

Organizations

commented 2 papers about 1 month ago

When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

Paper • 2509.20293 • Published Sep 24 • 7 •

When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

Paper • 2509.20293 • Published Sep 24 • 7 •

commented a paper 4 months ago

MARVIS: Modality Adaptive Reasoning over VISualizations

Paper • 2507.01544 • Published Jul 2 • 13 •

New activity in nyu-dice-lab/sos-artifacts 6 months ago

QwQ_32B_setting_5

#4 opened 6 months ago by

QwQ_32B_Setting_4

#3 opened 6 months ago by

More model answers

#2 opened 6 months ago by

New activity in nyu-dice-lab/sos-artifacts 7 months ago

QwQ-no-curation-on-new-models

#1 opened 7 months ago by

commented 2 papers about 1 year ago

SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification

Paper • 2410.05057 • Published Oct 7, 2024 • 7 •

Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Paper • 2409.15268 • Published Sep 23, 2024 • 13 •