Tempo14
's Collections
Attention
updated
Selective Attention Improves Transformer
Paper
•
2410.02703
•
Published
•
24
Paper
•
2410.05258
•
Published
•
179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent
Sparse Attention
Paper
•
2410.05076
•
Published
•
8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper
•
2410.13276
•
Published
•
29
Star Attention: Efficient LLM Inference over Long Sequences
Paper
•
2411.17116
•
Published
•
55
KV Shifting Attention Enhances Language Modeling
Paper
•
2411.19574
•
Published
•
9
Entropy-Guided Attention for Private LLMs
Paper
•
2501.03489
•
Published
•
14
Not All Language Model Features Are Linear
Paper
•
2405.14860
•
Published
•
41
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
158
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
298
Tensor Product Attention Is All You Need
Paper
•
2501.06425
•
Published
•
89
Sigma: Differential Rescaling of Query, Key and Value for Efficient
Language Models
Paper
•
2501.13629
•
Published
•
48
TransMLA: Multi-head Latent Attention Is All You Need
Paper
•
2502.07864
•
Published
•
58
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
•
2502.11089
•
Published
•
165
Does Time Have Its Place? Temporal Heads: Where Language Models Recall
Time-specific Information
Paper
•
2502.14258
•
Published
•
26
How Do Large Vision-Language Models See Text in Image? Unveiling the
Distinctive Role of OCR Heads
Paper
•
2505.15865
•
Published
•
4
Learning to Skip the Middle Layers of Transformers
Paper
•
2506.21103
•
Published
•
18
Limitations of Normalization in Attention Mechanism
Paper
•
2508.17821
•
Published
•
7
Native Hybrid Attention for Efficient Sequence Modeling
Paper
•
2510.07019
•
Published
•
16
Attention Sinks in Diffusion Language Models
Paper
•
2510.15731
•
Published
•
47
Every Attention Matters: An Efficient Hybrid Architecture for
Long-Context Reasoning
Paper
•
2510.19338
•
Published
•
110
Paper
•
2510.23052
•
Published
•
28
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper
•
2510.26692
•
Published
•
96