Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression Paper • 2510.01450 • Published Oct 1 • 2
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression Paper • 2510.01450 • Published Oct 1 • 2
YifeiZuo/50M-Muon-LR4e-3-WM0-STEP200000-BZ256-SEQ4096-checkpoint-300 Text Generation • 49.5M • Updated Apr 14
YifeiZuo/50M-Muon-LR4e-3-WM0-STEP200000-BZ256-SEQ4096-checkpoint-300 Text Generation • 49.5M • Updated Apr 14
YifeiZuo/50M-Muon-LR4e-3-WM0-STEP200000-BZ256-SEQ4096-checkpoint-200 Text Generation • 49.5M • Updated Apr 14
YifeiZuo/50M-Muon-LR4e-3-WM0-STEP200000-BZ256-SEQ4096-checkpoint-200 Text Generation • 49.5M • Updated Apr 14
YifeiZuo/50M-Muon-LR4e-3-WM0-STEP200000-BZ256-SEQ4096-checkpoint-100 Text Generation • 49.5M • Updated Apr 14
YifeiZuo/50M-Muon-LR4e-3-WM0-STEP200000-BZ256-SEQ4096-checkpoint-100 Text Generation • 49.5M • Updated Apr 14