Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.11419

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

monuirctc/invoice_extraction_layoutlmv2

Updated Aug 10, 2021
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Applied Machine Learning Papers

Reading List (Mainly Focused of VLM's and Diffusion Models)

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 18
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 14
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 15
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 11

😱 Microsoft Papers with no code/data release

Collection of Microsoft Papers with no code/data release

MEGA: Multilingual Evaluation of Generative AI

Paper • 2303.12528 • Published Mar 22, 2023
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Paper • 2311.07463 • Published Nov 13, 2023 • 15
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
A Unified View of Masked Image Modeling

Paper • 2210.10615 • Published Oct 19, 2022

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

about 5 hours ago

Language models are weak learners

Paper • 2306.14101 • Published Jun 25, 2023 • 10
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence

Paper • 2306.07075 • Published Jun 12, 2023 • 10
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT

Paper • 2307.08674 • Published Jul 17, 2023 • 48
Nougat: Neural Optical Understanding for Academic Documents

Paper • 2308.13418 • Published Aug 25, 2023 • 39

Text and Image recognition

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 34
Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Paper • 2310.02992 • Published Oct 4, 2023 • 4
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 56

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

about 5 hours ago

Language models are weak learners

Paper • 2306.14101 • Published Jun 25, 2023 • 10
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence

Paper • 2306.07075 • Published Jun 12, 2023 • 10
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT

Paper • 2307.08674 • Published Jul 17, 2023 • 48
Nougat: Neural Optical Understanding for Academic Documents

Paper • 2308.13418 • Published Aug 25, 2023 • 39

monuirctc/invoice_extraction_layoutlmv2

Updated Aug 10, 2021
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Text and Image recognition

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Applied Machine Learning Papers

Reading List (Mainly Focused of VLM's and Diffusion Models)

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 18
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 14
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 15
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 11

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

😱 Microsoft Papers with no code/data release

Collection of Microsoft Papers with no code/data release

MEGA: Multilingual Evaluation of Generative AI

Paper • 2303.12528 • Published Mar 22, 2023
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Paper • 2311.07463 • Published Nov 13, 2023 • 15
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
A Unified View of Masked Image Modeling

Paper • 2210.10615 • Published Oct 19, 2022

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 34
Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Paper • 2310.02992 • Published Oct 4, 2023 • 4
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 56

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs