Qwen/Qwen3-VL-235B-A22B-Instruct Image-Text-to-Text • 236B • Updated 7 days ago • 76.2k • • 325
Qwen/Qwen3-VL-235B-A22B-Thinking Image-Text-to-Text • 236B • Updated 7 days ago • 6.71k • • 333
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 66