VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation Paper β’ 2503.21214 β’ Published Mar 27 β’ 2