-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2309.11419
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 14 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 15 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 11
-
MEGA: Multilingual Evaluation of Generative AI
Paper • 2303.12528 • Published -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 15 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
A Unified View of Masked Image Modeling
Paper • 2210.10615 • Published
-
Language models are weak learners
Paper • 2306.14101 • Published • 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 39
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
Paper • 2306.14824 • Published • 34 -
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Paper • 2310.02992 • Published • 4 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 56
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
Language models are weak learners
Paper • 2306.14101 • Published • 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 39
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 14 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 15 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 11
-
MEGA: Multilingual Evaluation of Generative AI
Paper • 2303.12528 • Published -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 15 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
A Unified View of Masked Image Modeling
Paper • 2210.10615 • Published
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
Paper • 2306.14824 • Published • 34 -
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Paper • 2310.02992 • Published • 4 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 56