Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.13595

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 226
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 27
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 39

A collection of items telated the the MMTEB release

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38
Running on CPU Upgrade

6.16k

6.16k

MTEB Leaderboard

🥇

Embedding Leaderboard

Self-Boosting Large Language Models with Synthetic Preference Data

Paper • 2410.06961 • Published Oct 9, 2024 • 17
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 373
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation

Paper • 2412.13649 • Published Dec 18, 2024 • 20
NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26 • 40

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 28
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 28

This is a collection of MTEB papers (not exhaustive).

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38
MTEB: Massive Text Embedding Benchmark

Paper • 2210.07316 • Published Oct 13, 2022 • 6
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

Paper • 2406.02396 • Published Jun 4, 2024
Extending the Massive Text Embedding Benchmark to French

Paper • 2405.20468 • Published May 30, 2024 • 2

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 39
SPLADE-v3: New baselines for SPLADE

Paper • 2403.06789 • Published Mar 11, 2024 • 4
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Paper • 2409.14683 • Published Sep 23, 2024 • 12

Abstention Reranking

Related paper: "Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism" (accepted at TMLR 2024)

Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism

Paper • 2402.12997 • Published Feb 20, 2024 • 9
hgissbkh/abstention-reranking-benchmark

Viewer • Updated Oct 2, 2024 • 132 • 20
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 6
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 22
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 13
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 70

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 226
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 27
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 39

This is a collection of MTEB papers (not exhaustive).

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38
MTEB: Massive Text Embedding Benchmark

Paper • 2210.07316 • Published Oct 13, 2022 • 6
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

Paper • 2406.02396 • Published Jun 4, 2024
Extending the Massive Text Embedding Benchmark to French

Paper • 2405.20468 • Published May 30, 2024 • 2

A collection of items telated the the MMTEB release

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38
Running on CPU Upgrade

6.16k

6.16k

MTEB Leaderboard

🥇

Embedding Leaderboard

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 39
SPLADE-v3: New baselines for SPLADE

Paper • 2403.06789 • Published Mar 11, 2024 • 4
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Paper • 2409.14683 • Published Sep 23, 2024 • 12

Self-Boosting Large Language Models with Synthetic Preference Data

Paper • 2410.06961 • Published Oct 9, 2024 • 17
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 373
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation

Paper • 2412.13649 • Published Dec 18, 2024 • 20
NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26 • 40

Abstention Reranking

Related paper: "Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism" (accepted at TMLR 2024)

Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism

Paper • 2402.12997 • Published Feb 20, 2024 • 9
hgissbkh/abstention-reranking-benchmark

Viewer • Updated Oct 2, 2024 • 132 • 20
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 28
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 28

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 6
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 22
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 13
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 70

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs