Collections
Discover the best community collections!
Collections including paper arxiv:2503.18908
-
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper • 2503.04872 • Published • 15 -
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Paper • 2503.18908 • Published • 20
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 42 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 119 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 49 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 43
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 104 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 44
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 68 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 48 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 39 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 145
-
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 18 -
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 16 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 7
-
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper • 2503.04872 • Published • 15 -
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Paper • 2503.18908 • Published • 20
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 42 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 119 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 49 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 43
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 68 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 48 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 39 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 145
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 104 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 44
-
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 18 -
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 16 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 7