Yang Shi's picture

8 16 1

Yang Shi

DogNeverSleep

·

https://FrankYang-17.github.io/

FrankYang-17

AI & ML interests

👨🏻‍🎓PhD student at Peking University

Recent Activity

authored a paper 12 days ago

Monet: Reasoning in Latent Visual Space Beyond Images and Language

upvoted a paper 12 days ago

Monet: Reasoning in Latent Visual Space Beyond Images and Language

commented on a paper 12 days ago

Monet: Reasoning in Latent Visual Space Beyond Images and Language

View all activity

Organizations

commented a paper 12 days ago

Monet: Reasoning in Latent Visual Space Beyond Images and Language

Paper • 2511.21395 • Published 13 days ago • 15 •

commented a paper about 1 month ago

When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs

Paper • 2511.02243 • Published Nov 4 • 24 •

commented a paper about 2 months ago

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Paper • 2510.10395 • Published Oct 12 • 29 •

commented 2 papers 2 months ago

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Paper • 2509.24900 • Published Sep 29 • 53 •

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Paper • 2509.24897 • Published Sep 29 • 46 •

New activity in DogNeverSleep/MME-VideoOCR_Dataset 6 months ago

Add paper link, license

#2 opened 6 months ago by

commented a paper 7 months ago

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Paper • 2505.21333 • Published May 27 • 38 •

commented a paper 8 months ago

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30 •