Ivy Zhang's picture

Ivy Zhang

Ivy1997

·

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago

tomg-group-umd/zero-model-checkpoints

upvoted a paper 3 days ago

Sekai: A Video Dataset towards World Exploration

upvoted a paper 9 days ago

AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

View all activity

Organizations

upvoted a paper 3 days ago

Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published 8 days ago • 60

upvoted 3 papers 9 days ago

AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Paper • 2506.10974 • Published 14 days ago • 18

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published 14 days ago • 65

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published 10 days ago • 236

upvoted 3 papers 16 days ago

Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 35

Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

Paper • 2506.02865 • Published 23 days ago • 30

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion

Paper • 2506.01111 • Published 25 days ago • 29

upvoted a paper 21 days ago

MiMo-VL Technical Report

Paper • 2506.03569 • Published 22 days ago • 72

upvoted a paper 22 days ago

IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

Paper • 2506.00979 • Published 25 days ago • 13

upvoted a paper 27 days ago

Zero-Shot Vision Encoder Grafting via LLM Surrogates

Paper • 2505.22664 • Published 29 days ago • 7

upvoted a paper about 1 month ago

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Paper • 2505.19147 • Published May 25 • 145

upvoted a paper about 2 months ago

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5 • 76

upvoted 8 papers 2 months ago

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 12

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Paper • 2504.08066 • Published Apr 10 • 14

VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

Paper • 2504.09130 • Published Apr 12 • 12

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Paper • 2504.09641 • Published Apr 13 • 16

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Paper • 2504.10127 • Published Apr 14 • 17

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30

Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability

Paper • 2504.08003 • Published Apr 9 • 49

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43