✨ Any-to-Any & World-Model : one step forward to the real world - BAAI Emu 3.5 - Antgroup Ming-flash-omni - HunyuanWorld-Mirror: 3D
Aligning with the “world model” globally
✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms - SoulX-Podcast TTS - LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform - xiabs DreamOmni 2
✨ 48B total/ 3B active - MIT license ✨ Up to 1M context ✨ 84.3 on RULER (128k) with 3.98× speedup ✨ Hybrid KDA + MLA architecture for peak throughput & quality
✨ Compresses long sequences visually to bypass token limits ✨ Reduces computational and memory costs ✨ Preserves meaning through multimodal encoding ✨ Built on GLM-4.1V-9B-Base
✨ Any prior in → 3D world out ✨ Mix camera, intrinsics, depth as priors ✨ Predict point clouds, normals, Gaussians & more in one pass ✨ Unified architecture for all 3D task
✨ Trained on Honey-Data-15M, a 15M-sample SFT corpus with dual-level CoT reasoning ✨ Backed by HoneyPipe, a transparent & reproducible open data curation suite