Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.18908

attention parallel

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

Papers to review

Just an EZ way to collect papers on HF

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 34
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

Paper • 2503.04872 • Published Mar 6 • 15
FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 42
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 119
Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 49
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 43

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 624
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 104
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 44

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 52
FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 68
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 48
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 39
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 145

Graph insights with Ai and LLM

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Paper • 2310.17796 • Published Oct 26, 2023 • 18
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

Paper • 2311.08263 • Published Nov 14, 2023 • 16
Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 123
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7 • 7

attention parallel

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

Papers to review

Just an EZ way to collect papers on HF

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 34
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

Paper • 2503.04872 • Published Mar 6 • 15
FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 52
FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 42
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 119
Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 49
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 43

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 68
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 48
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 39
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 145

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 624
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 104
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 44

Graph insights with Ai and LLM

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Paper • 2310.17796 • Published Oct 26, 2023 • 18
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

Paper • 2311.08263 • Published Nov 14, 2023 • 16
Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 123
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7 • 7

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs