RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23 • 32
MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 24 items • Updated about 16 hours ago • 37
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Paper • 2312.00849 • Published Dec 1, 2023 • 12