Shaobai Jiang's picture

4 652

Shaobai Jiang

shaobaij

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 4 hours ago

PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold

upvoted a paper about 4 hours ago

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

upvoted a paper about 6 hours ago

Context Engineering 2.0: The Context of Context Engineering

View all activity

Organizations

None yet

upvoted 2 papers about 4 hours ago

PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold

Paper • 2510.15862 • Published 22 days ago • 9

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Paper • 2510.17790 • Published 19 days ago • 5

upvoted a paper about 6 hours ago

Context Engineering 2.0: The Context of Context Engineering

Paper • 2510.26493 • Published 9 days ago • 6

upvoted a paper about 24 hours ago

QueST: Incentivizing LLMs to Generate Difficult Problems

Paper • 2510.17715 • Published 19 days ago • 32

upvoted 2 papers 2 days ago

Towards Robust Mathematical Reasoning

Paper • 2511.01846 • Published 5 days ago • 7

Deep Self-Evolving Reasoning

Paper • 2510.17498 • Published 19 days ago • 11

upvoted a paper 5 days ago

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published 10 days ago • 40

upvoted a paper 6 days ago

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

Paper • 2510.20270 • Published 17 days ago • 6

upvoted 12 papers 7 days ago

ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning

Paper • 2510.15211 • Published 23 days ago • 2

olmOCR 2: Unit Test Rewards for Document OCR

Paper • 2510.19817 • Published 17 days ago • 13

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

Paper • 2510.18927 • Published 18 days ago • 82

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs

Paper • 2510.18279 • Published 19 days ago • 4

Prompt-MII: Meta-Learning Instruction Induction for LLMs

Paper • 2510.16932 • Published 20 days ago • 6

Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published 19 days ago • 64

Robot Learning: A Tutorial

Paper • 2510.12403 • Published 25 days ago • 103

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

Paper • 2510.12801 • Published 25 days ago • 13

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

Paper • 2510.12635 • Published 25 days ago • 15

Base Models Know How to Reason, Thinking Models Learn When

Paper • 2510.07364 • Published Oct 8 • 1

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models

Paper • 2510.08492 • Published about 1 month ago • 8

RAG-Anything: All-in-One RAG Framework

Paper • 2510.12323 • Published 26 days ago • 47