Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference Paper β’ 2401.08383 β’ Published Jan 16, 2024 β’ 1
The Case for Co-Designing Model Architectures with Hardware Paper β’ 2401.14489 β’ Published Jan 25, 2024 β’ 4
Continual Pre-Training of Large Language Models: How to (re)warm your model? Paper β’ 2308.04014 β’ Published Aug 8, 2023 β’ 2
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling Paper β’ 2304.01373 β’ Published Apr 3, 2023 β’ 9
GPT-NeoX-20B: An Open-Source Autoregressive Language Model Paper β’ 2204.06745 β’ Published Apr 14, 2022 β’ 1
BlackMamba: Mixture of Experts for State-Space Models Paper β’ 2402.01771 β’ Published Feb 1, 2024 β’ 25