LLaVa-NeXT Collection LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 34
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 9 • 59
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 10 items • Updated Mar 2 • 561
view article Article Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers Narsil • Feb 1, 2022 • 16
view article Article Running Large Transformer Models on Mobile and Edge Devices tugrulkaya • Nov 3, 2025 • 13
view article Article Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies prithivMLmods • Feb 17, 2025 • 29
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge NormalUhr • Feb 7, 2025 • 293
view article Article Preference Optimization for Vision Language Models +2 qgallouedec, vwxyzjn, merve, kashif • Jul 10, 2024 • 93
view article Article Vision Language Model Alignment in TRL ⚡️ +3 sergiopaniego, merve, qgallouedec, kashif, ariG23498 • Aug 7, 2025 • 111
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 337
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 673
view article Article Introducing Command A Vision: Multimodal AI built for Business CohereLabs • Jul 31, 2025 • 64
view article Article SmolVLM - small yet mighty Vision Language Model +3 andito, merve, mfarre, eliebak, pcuenq • Nov 26, 2024 • 418