olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models Paper • 2502.18443 • Published Feb 25 • 9
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 8 days ago • 172
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory Paper • 2504.19413 • Published Apr 28 • 28
A Survey of Data Agents: Emerging Paradigm or Overstated Hype? Paper • 2510.23587 • Published 8 days ago • 64
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold Paper • 2510.15862 • Published 18 days ago • 7
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published 13 days ago • 28
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery Paper • 2510.15869 • Published 18 days ago • 44
Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation Paper • 2510.15624 • Published 18 days ago • 14
WithAnyone: Towards Controllable and ID Consistent Image Generation Paper • 2510.14975 • Published 19 days ago • 80
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published 19 days ago • 80
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 22 days ago • 160
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 22 days ago • 172
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 127
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published 29 days ago • 96
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 29 days ago • 463
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30 • 523