view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand 3 days ago • 42
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 59