-
CoTracker
🎨279Track points in a video
-
CoTracker: It is Better to Track Together
Paper • 2307.07635 • Published • 19 -
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Paper • 2306.08637 • Published -
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Paper • 2403.14548 • Published
Johannes Kolbe PRO
johko
AI & ML interests
None yet
Recent Activity
updated a Space 4 days ago
johko/llm-blind-date posted an update 4 days ago
One prompt, three answers - which model is from where?
https://huggingface.co/spaces/johko/llm-blind-date
I built a little demo where you give three models (Apertus, Llama, Qwen3) the same prompt and in the end you have to guess which is which just based on their answers.
GIve it a try! ;) upvoted a paper 5 days ago
Apertus: Democratizing Open and Compliant LLMs for Global Language
EnvironmentsOrganizations
Deceptive Prompts for MLLMs
-
A Survey on Hallucination in Large Vision-Language Models
Paper • 2402.00253 • Published -
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
Paper • 2402.08680 • Published • 1 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper • 2402.13220 • Published • 14 -
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Paper • 2404.05046 • Published
Virtual Try-On
-
IMAGDressing-v1: Customizable Virtual Dressing
Paper • 2407.12705 • Published • 13 -
Dress Code: High-Resolution Multi-Category Virtual Try-On
Paper • 2204.08532 • Published • 2 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper • 2403.01779 • Published • 30 -
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing
Paper • 2403.14828 • Published
Consistent Image Generation
-
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 58 -
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Paper • 2402.09812 • Published • 16 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 56
VLM Interleaved Images
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 42 -
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper • 2407.08683 • Published • 24 -
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Paper • 2407.06135 • Published • 22 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper • 2407.03320 • Published • 94
Text driven Image Editing
Point Tracking
- Runtime errorAgentsFeatured279
CoTracker
🎨279Track points in a video
-
CoTracker: It is Better to Track Together
Paper • 2307.07635 • Published • 19 -
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Paper • 2306.08637 • Published -
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Paper • 2403.14548 • Published
Consistent Image Generation
-
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 58 -
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Paper • 2402.09812 • Published • 16 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 56
Deceptive Prompts for MLLMs
-
A Survey on Hallucination in Large Vision-Language Models
Paper • 2402.00253 • Published -
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
Paper • 2402.08680 • Published • 1 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper • 2402.13220 • Published • 14 -
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Paper • 2404.05046 • Published
VLM Interleaved Images
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 42 -
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper • 2407.08683 • Published • 24 -
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Paper • 2407.06135 • Published • 22 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper • 2407.03320 • Published • 94
Virtual Try-On
-
IMAGDressing-v1: Customizable Virtual Dressing
Paper • 2407.12705 • Published • 13 -
Dress Code: High-Resolution Multi-Category Virtual Try-On
Paper • 2204.08532 • Published • 2 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper • 2403.01779 • Published • 30 -
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing
Paper • 2403.14828 • Published
Text driven Image Editing